Count how many unique fields are in several columns of data table

HariSeldon · June 14, 2021, 12:00pm

In R I need to count how many unique phecode (doubles in columns a to i) are in piv_pat_rewide anonymized randomized stub patient data for a biology project. Most example code tells someone how to count how many unique doubles there are in ONE column but I need to do it for MORE THAN ONE column (columns a to i).

This is a continuation of Replace most non-header fields in a TSV file based on a TSV conversion table

REPREX:

library(tidyverse)
library(data.table)

patientdata = data.table(
  patient = c("patient1", "patient2", "patient3"),
  sex = c("male", "female", "female"),
  DISa = c("K40.9", "B96.8", "NOT"),
  DISb = c("K43.9", "D12.6", ""),
 DISc = c("N20.0", "E11.6", ""),
  DISd = c("N20.1", "E87.6", ""),
  DISe = c("N23", "I44.7", ""),
  DISf = c("N39.0", "K40.9", ""),
  DISg = c("R69", "K43.9", ""),
  DISh = c("Z88.1", "K52.9", "B96.8"),
  DISi = c("K40.9", "B96.8", "NOT")
)

ICCD10csv <- data.table(
  icd10cm = c("K40.9", "K43.9", "N20.0", "N20.1", "N23", "N39.0", "R69", "Z88.1", "B96.8", "D12.6", "E11.6", "E87.6", "I44.7", "K40.9", "K43.9", "K52.9", "XNO"),
  phecode = c("550.1", "550.5", "594.1", "594.3", "594.8", "591", "1019", "960.1", "041", "208", "250.2", "276.14", "426.32", "550.1", "550.5", "558", "17")
)

(piv_pat <- pivot_longer(
  patientdata,
  cols=c(-patient,-sex)
))

(piv_pat_jn <- left_join(piv_pat,
                         distinct(ICCD10csv),
                         by=c("value"="icd10cm")))

(piv_pat_rewide <- pivot_wider(piv_pat_jn,
                               id_cols = c(patient,sex),
                               names_from = "name",
                               values_from = "phecode"
))

The output I need is:

550.1    3
550.5    2
594.1    1
594.3    1
594.8    1
591    1
1019    1
960.1    1
550.1    1
041    3
208    1
250.2    1
276.14    1
426.32    1
558    1

The closest code I got is:

# https://forum.posit.co/t/counting-unique-values-in-multiple-columns/31136/4
piv_pat_rewide %>%
  gather(key = phecode,
         value = Value,
         starts_with("DIS")) %>%
  distinct(Value) %>%
  count(Value)

Which outputs:

# A tibble: 15 x 2
   Value      n
   <chr>  <int>
 1 041        1
 2 1019       1
 3 208        1
 4 250.2      1
 5 276.14     1
 6 426.32     1
 7 550.1      1
 8 550.5      1
 9 558        1
10 591        1
11 594.1      1
12 594.3      1
13 594.8      1
14 960.1      1
15 NA         1

martin.R · June 14, 2021, 12:51pm

count(piv_pat_jn, phecode)

system · June 21, 2021, 12:52pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.