Turning all table elements into column IDs then new table 0 or 1 if the patient (row ID) had that element

OK, so is this what you expect ?

patients <- data.table(
  ID = c("id1", "id2", "id3"),
  f.41270.1 = c("184.11", "987", ""),
  f.41270.2 = c("151.11", "", ""),
  f.41270.3 = c("", "184.11", "")
)

# here you retrieve all possible 'specific' diseases code
all_dis <- setdiff(unique(unlist(patients[, -1])),"")

# here you loop over each row of patients data.table
patients2 <- apply(patients[, -1], 1, FUN = function(x) {
  # then you look if there is a match
  as.raw(all_dis %in% x)
})
dimnames(patients2) <- list(all_dis, patients$ID)

# print it
print(t(patients2))

    184.11 987 151.11
id1     01  00     01
id2     01  01     00
id3     00  00     00

That is what I expect the stub works, but when I run it on the full data it doesn't like [,-1] which is weird the full data just has different column names and data inside the table. I checked the patient IDs are still the first column in the full data.

Error in patients[, -1] : incorrect number of dimensions
Calls: setdiff -> unique -> unlist
Execution halted

In the full data could you provide what is the ouput of

str(patients)

Or (if it is too sensitive)

dim(patients)

Let's try the stub again.

ID is patient ID, e.g. id1
f. are disease groups column names in the original table, e.g. f.41270.1
everything inside the table are specific diseases, e.g. 184.11

Original table
ID f.41270.1 f.41270.2 f.41270.3
id1 184.11 151 NA
id2 987 NA 184.11
id3 008 NA NA

Output table
ID 184.1 151 987 008
id1 1 1 0 0
id2 1 0 1 0
id3 0 0 0 1

It's hard to help since as mentionned previously your do not provide details on your data structure.
Notably, patients[,-1] may lead to same error.

That being said, I would do almost exactly the same with NA as with ""

library(data.table)
patients <- data.table(
  ID = c("id1", "id2", "id3"),
  f.41270.1 = c("184.11", "987", "008"),
  f.41270.2 = c("151", NA, NA),
  f.41270.3 = c(NA, "184.11", NA)
)

# here you retrieve all possible 'specific' diseases codes
all_dis = setdiff(unique(unlist(patients[, -1])),NA)

# if NA is within table and produces "NA" then
# all_dis = setdiff(unique(unlist(patients[, -1])),"NA")
# if you want to remove several undesired values e.g. "", NA,"NA"
# all_dis = setdiff(unique(unlist(patients[, -1])),c(NA, "", "NA"))

# here you loop over each column of patients data.table
patients2 <- apply(patients[, -1], 1, FUN = function(x) {
  # then you look if there is a match
  as.raw(all_dis %in% x)
})
dimnames(patients2) = list(all_dis, patients$ID)

# print it
print(patients2)

Thank you so much that works except the columns and rows are reversed

print(patients2)
id1 id2 id3
184.11 01 01 00
987 00 01 00
008 00 00 01
151 01 00 00

The ids should be the rows and the 184.11, 987, 008, and 151 should be the columns.

A simple transpose fixes that

patients2 = t(patients2)
print(patients2)
184.11 987 008 151
id1 01 00 00 01
id2 01 01 00 00
id3 00 00 01 00

But last bug when I apply it to my full patient data I get an error
Error in patients[, -1] : incorrect number of dimensions
Calls: setdiff -> unique -> unlist
Execution halted

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.