This is a continuation of Replace most non-header fields in a TSV file based on a TSV conversion table
The stub code is:
library(tidyverse)
library(data.table)
patientdata = data.table(
a = c("patient1", "patient2", "patient3"),
b = c("K40.9", "B96.8", "NOT"),
c = c("K43.9", "D12.6", ""),
d = c("N20.0", "E11.6", ""),
e = c("N20.1", "E87.6", ""),
f = c("N23", "I44.7", ""),
g = c("N39.0", "K40.9", ""),
h = c("R69", "K43.9", ""),
i = c("Z88.1", "K52.9", "")
)
ICCD10csv <- data.table(
icd10cm = c("K40.9", "K43.9", "N20.0", "N20.1", "N23", "N39.0", "R69", "Z88.1", "B96.8", "D12.6", "E11.6", "E87.6", "I44.7", "K40.9", "K43.9", "K52.9", "XNO"),
phecode = c("550.1", "550.5", "594.1", "594.3", "594.8", "591", "1019", "960.1", "041", "208", "250.2", "276.14", "426.32", "550.1", "550.5", "558", "17")
)
(piv_pat <- pivot_longer(
patientdata,
cols=-a
))
(piv_pat_jn <- left_join(piv_pat,
distinct(ICCD10csv),
by=c("value"="icd10cm")))
(piv_pat_rewide <- pivot_wider(piv_pat_jn,
id_cols = "a",
names_from = "name",
values_from = "phecode"
))
write_tsv(piv_pat_rewide, "output.tsv")
However when I used it on the real not-stub data (not shown for privacy reasons) on one data set it works and on another data set I get this warning (... will contain list-cols) when I do piv_pat_rewide and this error (flat files can't ...) when I do write_tsv(piv_pat_rewide, "output.tsv"):
# Warning message:
# Values are not uniquely identified; output will contain list-cols.
# * Use `values_fn = list` to suppress this warning.
# * Use `values_fn = length` to identify where the duplicates arise
# * Use `values_fn = {summary_fun}` to summarise duplicates
# Error: Flat files can't store the list column
# Execution halted
How do I flatten the list-cols to char so I can do write_tsv()? I tried:
# https://tidyr.tidyverse.org/reference/hoist.html
piv_pat_rewide <- unnest_auto(piv_pat_rewide)
# Error: Argument `col` is missing with no default
# https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/unlist
piv_pat_rewide <- unlist(piv_pat_rewide)
# Error: is.data.frame(x) is not TRUE
# https://www.rdocumentation.org/packages/purrr/versions/0.2.5/topics/flatten
piv_pat_rewide <- flatten(piv_pat_rewide)
# Error: is.data.frame(x) is not TRUE
# https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/as.data.frame
piv_pat_rewide <- as.data.frame(piv_pat_rewide)
# Error: is.data.frame(x) is not TRUE
Maybe "Error: is.data.frame(x) is not TRUE" is happening because in the real data I don't do data.table() I do read_tsv() and read_csv() which might make the patient data and conversion table tibbles not data frames?