Flatten list-cols to char so I can use write_tsv()

HariSeldon · June 25, 2021, 8:07pm

This is a continuation of Replace most non-header fields in a TSV file based on a TSV conversion table

The stub code is:

library(tidyverse)
library(data.table)
patientdata = data.table(
a = c("patient1", "patient2", "patient3"),
b = c("K40.9", "B96.8", "NOT"),
c = c("K43.9", "D12.6", ""),
d = c("N20.0", "E11.6", ""),
e = c("N20.1", "E87.6", ""),
f = c("N23", "I44.7", ""),
g = c("N39.0", "K40.9", ""),
h = c("R69", "K43.9", ""),
i = c("Z88.1", "K52.9", "")
)
ICCD10csv <- data.table(
icd10cm = c("K40.9", "K43.9", "N20.0", "N20.1", "N23", "N39.0", "R69", "Z88.1", "B96.8", "D12.6", "E11.6", "E87.6", "I44.7", "K40.9", "K43.9", "K52.9", "XNO"),
phecode = c("550.1", "550.5", "594.1", "594.3", "594.8", "591", "1019", "960.1", "041", "208", "250.2", "276.14", "426.32", "550.1", "550.5", "558", "17")
)

(piv_pat <- pivot_longer(
  patientdata,
  cols=-a
))

(piv_pat_jn <- left_join(piv_pat,
                         distinct(ICCD10csv),
                         by=c("value"="icd10cm")))

(piv_pat_rewide <- pivot_wider(piv_pat_jn,
                               id_cols = "a",
                               names_from = "name",
                               values_from = "phecode"
))

write_tsv(piv_pat_rewide, "output.tsv")

However when I used it on the real not-stub data (not shown for privacy reasons) on one data set it works and on another data set I get this warning (... will contain list-cols) when I do piv_pat_rewide and this error (flat files can't ...) when I do write_tsv(piv_pat_rewide, "output.tsv"):

# Warning message:
# Values are not uniquely identified; output will contain list-cols.
# * Use `values_fn = list` to suppress this warning.
# * Use `values_fn = length` to identify where the duplicates arise
# * Use `values_fn = {summary_fun}` to summarise duplicates
# Error: Flat files can't store the list column
# Execution halted

How do I flatten the list-cols to char so I can do write_tsv()? I tried:

# https://tidyr.tidyverse.org/reference/hoist.html
piv_pat_rewide <- unnest_auto(piv_pat_rewide)
# Error: Argument `col` is missing with no default

# https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/unlist
piv_pat_rewide <- unlist(piv_pat_rewide)
# Error: is.data.frame(x) is not TRUE

# https://www.rdocumentation.org/packages/purrr/versions/0.2.5/topics/flatten
piv_pat_rewide <- flatten(piv_pat_rewide)
# Error: is.data.frame(x) is not TRUE

# https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/as.data.frame
piv_pat_rewide <- as.data.frame(piv_pat_rewide)
# Error: is.data.frame(x) is not TRUE

Maybe "Error: is.data.frame(x) is not TRUE" is happening because in the real data I don't do data.table() I do read_tsv() and read_csv() which might make the patient data and conversion table tibbles not data frames?

DavoWW · June 28, 2021, 3:59am

Hi @HariSeldon,
I think this error is occurring because your data.table column patientdata$a contains some non-unique values. I can reproduce your error by changing the third patient name to "patient2" which duplicates the second value. This means that while your code runs, the intermediate structures are probably not what you are expecting.

Try running:

patientdata %>% 
  mutate(dups = duplicated(a))

system · July 19, 2021, 4:00am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.