Pivot wider creates tibble with column-lists

Hello everybody! I've been struggling with this for a while. The ultimate goal of this is to map diagnostic categories onto diagnostic codes.

The first dataset imitates the original data frame but in the pivoted longer format and contains a unique ID, diagnostic code, and position of diagnostic code. The second has diagnostic codes and descriptions for each diagnostic code. I merged tibbles, selected the columns, and then pivoted wider.

The problem with pivot wider is that it produced the column lists since sometimes each diagnostic code corresponds to multiple descriptions. Unnesting the column doesn't work since it converts the data frame into a partially long format since there are duplicates of IDs. Ideally, when unnesting is done, the untested value should go into the new/empty column and belong to the same ID.

I'm new to R and I have no idea how to approach this problem or if there are better solutions to reach the ultimate goal ( mapping diagnostic categories onto the code in given dataframes). Thank you very much for your help

#> Warning: package 'ggplot2' was built under R version 4.1.3
#> Warning: package 'tibble' was built under R version 4.1.3
#> Warning: package 'tidyr' was built under R version 4.1.3
#> Warning: package 'readr' was built under R version 4.1.3
#> Warning: package 'dplyr' was built under R version 4.1.3

##generate data
ID<-c(10000403,10000403,10000403,10000403,10000403,10000403,10000402,10000402,10000402,10000402,10000402,10000402,
  10000404,10000404,10000404,10000404,10000404,10000404)
DX_position<-c("I10_DX1","I10_DX2","I10_DX3","I10_DX4","I10_DX5","I10_DX6",
               "I10_DX1","I10_DX2","I10_DX3","I10_DX4","I10_DX5","I10_DX6",
"I10_DX1","I10_DX2","I10_DX3","I10_DX4","I10_DX5","I10_DX6")

DX_name <- c("F332", "R45851", "A322", "A321","F321",NA,
  "K1300","A1300","A1323",NA,NA,NA,
  "K1300","C123",NA,NA,NA,NA)

DX_description <- c("bad infection", "tuberculosis",
                    "Typhoid pneumonia",
                    "Typhoid pneumonia",
                    "Enterohemorrhagic",
                    "Foodborne",
                    "genitourinary infections", "Balantidiasis",
                    "Tuberculoma of brain and spinal cord",
                    "Septicemic plague","Cellulocutaneous plague", 
                    "Miliary tuberculosis, unspecified"
                    )
DX_for_mapping<- c("F332","F332",
                   "R45851", 
                   "A322",
                   "A321",
                   "F321",
                   "K1300", "K1300",
                   "A1300",
                   "A1323","A1323",
                   "C123")


pivoted<- bind_cols(ID, DX_position,DX_name)
#> New names:
#> * `` -> `...1`
#> * `` -> `...2`
#> * `` -> `...3`
mapping_group<- bind_cols(DX_for_mapping, DX_description)
#> New names:
#> * `` -> `...1`
#> * `` -> `...2`
colnames(pivoted)<- c("ID","DX_position","DX_code")
colnames(mapping_group) <- c("DX_code", "DX_name")

##merge two data frames   
merged<- merge(pivoted, mapping_group)

#pivoting
tibble_with_list<- merged%>%
  select(DX_position,DX_name, ID)%>%
  pivot_wider(names_from = DX_position, values_from = DX_name)
#> Warning: Values from `DX_name` are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = {summary_fun}` to summarise duplicates.
#> * Use the following dplyr code to identify duplicates.
#>   {data} %>%
#>     dplyr::group_by(ID, DX_position) %>%
#>     dplyr::summarise(n = dplyr::n(), .groups = "drop") %>%
#>     dplyr::filter(n > 1L)

tibble_with_list
#> # A tibble: 3 x 6
#>         ID I10_DX2   I10_DX3   I10_DX4   I10_DX5   I10_DX1  
#>      <dbl> <list>    <list>    <list>    <list>    <list>   
#> 1 10000402 <chr [1]> <chr [2]> <NULL>    <NULL>    <chr [2]>
#> 2 10000403 <chr [1]> <chr [1]> <chr [1]> <chr [1]> <chr [2]>
#> 3 10000404 <chr [1]> <NULL>    <NULL>    <NULL>    <chr [2]>

#unnesting
tibble_with_list %>%
  unnest(cols = c(I10_DX2, I10_DX3, I10_DX4, I10_DX5, I10_DX1))
#> # A tibble: 6 x 6
#>         ID I10_DX2                              I10_DX3  I10_DX4 I10_DX5 I10_DX1
#>      <dbl> <chr>                                <chr>    <chr>   <chr>   <chr>  
#> 1 10000402 Tuberculoma of brain and spinal cord Septice~ <NA>    <NA>    genito~
#> 2 10000402 Tuberculoma of brain and spinal cord Cellulo~ <NA>    <NA>    Balant~
#> 3 10000403 Typhoid pneumonia                    Typhoid~ Entero~ Foodbo~ bad in~
#> 4 10000403 Typhoid pneumonia                    Typhoid~ Entero~ Foodbo~ tuberc~
#> 5 10000404 Miliary tuberculosis, unspecified    <NA>     <NA>    <NA>    genito~
#> 6 10000404 Miliary tuberculosis, unspecified    <NA>     <NA>    <NA>    Balant~

Would this work for you?

library(tidyr)
library(dplyr)

ID<-c(10000403,10000403,10000403,10000403,10000403,10000403,10000402,10000402,10000402,10000402,10000402,10000402,
      10000404,10000404,10000404,10000404,10000404,10000404)
DX_position<-c("I10_DX1","I10_DX2","I10_DX3","I10_DX4","I10_DX5","I10_DX6",
               "I10_DX1","I10_DX2","I10_DX3","I10_DX4","I10_DX5","I10_DX6",
               "I10_DX1","I10_DX2","I10_DX3","I10_DX4","I10_DX5","I10_DX6")

DX_name <- c("F332", "R45851", "A322", "A321","F321",NA,
             "K1300","A1300","A1323",NA,NA,NA,
             "K1300","C123",NA,NA,NA,NA)

DX_description <- c("bad infection", "tuberculosis",
                    "Typhoid pneumonia",
                    "Typhoid pneumonia",
                    "Enterohemorrhagic",
                    "Foodborne",
                    "genitourinary infections", "Balantidiasis",
                    "Tuberculoma of brain and spinal cord",
                    "Septicemic plague","Cellulocutaneous plague", 
                    "Miliary tuberculosis, unspecified"
)
DX_for_mapping<- c("F332","F332",
                   "R45851", 
                   "A322",
                   "A321",
                   "F321",
                   "K1300", "K1300",
                   "A1300",
                   "A1323","A1323",
                   "C123")
pivoted <- tibble::tibble(ID, DX_position,DX_name)               
mapping_group<- tibble::tibble(DX_for_mapping, DX_description)
colnames(pivoted)<- c("ID","DX_position","DX_code")
colnames(mapping_group) <- c("DX_code", "DX_name")
merged<- merge(pivoted, mapping_group)
merged <- merged |> group_by(ID,DX_position) |> 
  mutate(Row=row_number())

Wide <- merged |> select(-DX_code) |> 
  pivot_wider(names_from = c("DX_position","Row"),values_from = "DX_name")

Wide
#> # A tibble: 3 x 8
#> # Groups:   ID [3]
#>         ID I10_DX2_1 I10_DX3_1 I10_DX3_2 I10_DX4_1 I10_DX5_1 I10_DX1_1 I10_DX1_2
#>      <dbl> <chr>     <chr>     <chr>     <chr>     <chr>     <chr>     <chr>    
#> 1 10000402 Tubercul~ Septicem~ Celluloc~ <NA>      <NA>      genitour~ Balantid~
#> 2 10000403 Typhoid ~ Typhoid ~ <NA>      Enterohe~ Foodborne bad infe~ tubercul~
#> 3 10000404 Miliary ~ <NA>      <NA>      <NA>      <NA>      genitour~ Balantid~

Created on 2022-05-30 by the reprex package (v2.0.1)

Thank you!!! This is the next level. I don't think I could have come up with anything close to it. Works like magic even on a slightly more difficult case ( extra column of values to "values_from).

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.