In R I need to count how many unique phecode (doubles in columns a to i) are in piv_pat_rewide anonymized randomized stub patient data for a biology project. Most example code tells someone how to count how many unique doubles there are in ONE column but I need to do it for MORE THAN ONE column (columns a to i).
This is a continuation of Replace most non-header fields in a TSV file based on a TSV conversion table
REPREX:
library(tidyverse)
library(data.table)
patientdata = data.table(
patient = c("patient1", "patient2", "patient3"),
sex = c("male", "female", "female"),
DISa = c("K40.9", "B96.8", "NOT"),
DISb = c("K43.9", "D12.6", ""),
DISc = c("N20.0", "E11.6", ""),
DISd = c("N20.1", "E87.6", ""),
DISe = c("N23", "I44.7", ""),
DISf = c("N39.0", "K40.9", ""),
DISg = c("R69", "K43.9", ""),
DISh = c("Z88.1", "K52.9", "B96.8"),
DISi = c("K40.9", "B96.8", "NOT")
)
ICCD10csv <- data.table(
icd10cm = c("K40.9", "K43.9", "N20.0", "N20.1", "N23", "N39.0", "R69", "Z88.1", "B96.8", "D12.6", "E11.6", "E87.6", "I44.7", "K40.9", "K43.9", "K52.9", "XNO"),
phecode = c("550.1", "550.5", "594.1", "594.3", "594.8", "591", "1019", "960.1", "041", "208", "250.2", "276.14", "426.32", "550.1", "550.5", "558", "17")
)
(piv_pat <- pivot_longer(
patientdata,
cols=c(-patient,-sex)
))
(piv_pat_jn <- left_join(piv_pat,
distinct(ICCD10csv),
by=c("value"="icd10cm")))
(piv_pat_rewide <- pivot_wider(piv_pat_jn,
id_cols = c(patient,sex),
names_from = "name",
values_from = "phecode"
))
The output I need is:
550.1 3
550.5 2
594.1 1
594.3 1
594.8 1
591 1
1019 1
960.1 1
550.1 1
041 3
208 1
250.2 1
276.14 1
426.32 1
558 1
The closest code I got is:
# https://forum.posit.co/t/counting-unique-values-in-multiple-columns/31136/4
piv_pat_rewide %>%
gather(key = phecode,
value = Value,
starts_with("DIS")) %>%
distinct(Value) %>%
count(Value)
Which outputs:
# A tibble: 15 x 2
Value n
<chr> <int>
1 041 1
2 1019 1
3 208 1
4 250.2 1
5 276.14 1
6 426.32 1
7 550.1 1
8 550.5 1
9 558 1
10 591 1
11 594.1 1
12 594.3 1
13 594.8 1
14 960.1 1
15 NA 1