patients <- data.table(
ID = c("id1", "id2", "id3"),
f.41270.1 = c("184.11", "987", ""),
f.41270.2 = c("151.11", "", ""),
f.41270.3 = c("", "184.11", "")
)
# here you retrieve all possible 'specific' diseases code
all_dis <- setdiff(unique(unlist(patients[, -1])),"")
# here you loop over each row of patients data.table
patients2 <- apply(patients[, -1], 1, FUN = function(x) {
# then you look if there is a match
as.raw(all_dis %in% x)
})
dimnames(patients2) <- list(all_dis, patients$ID)
# print it
print(t(patients2))
184.11 987 151.11
id1 01 00 01
id2 01 01 00
id3 00 00 00
That is what I expect the stub works, but when I run it on the full data it doesn't like [,-1] which is weird the full data just has different column names and data inside the table. I checked the patient IDs are still the first column in the full data.
Error in patients[, -1] : incorrect number of dimensions
Calls: setdiff -> unique -> unlist
Execution halted
ID is patient ID, e.g. id1
f. are disease groups column names in the original table, e.g. f.41270.1
everything inside the table are specific diseases, e.g. 184.11
Original table IDf.41270.1f.41270.2f.41270.3
id1 184.11 151 NA
id2 987 NA 184.11
id3 008 NA NA
It's hard to help since as mentionned previously your do not provide details on your data structure.
Notably, patients[,-1] may lead to same error.
That being said, I would do almost exactly the same with NA as with ""
library(data.table)
patients <- data.table(
ID = c("id1", "id2", "id3"),
f.41270.1 = c("184.11", "987", "008"),
f.41270.2 = c("151", NA, NA),
f.41270.3 = c(NA, "184.11", NA)
)
# here you retrieve all possible 'specific' diseases codes
all_dis = setdiff(unique(unlist(patients[, -1])),NA)
# if NA is within table and produces "NA" then
# all_dis = setdiff(unique(unlist(patients[, -1])),"NA")
# if you want to remove several undesired values e.g. "", NA,"NA"
# all_dis = setdiff(unique(unlist(patients[, -1])),c(NA, "", "NA"))
# here you loop over each column of patients data.table
patients2 <- apply(patients[, -1], 1, FUN = function(x) {
# then you look if there is a match
as.raw(all_dis %in% x)
})
dimnames(patients2) = list(all_dis, patients$ID)
# print it
print(patients2)
But last bug when I apply it to my full patient data I get an error
Error in patients[, -1] : incorrect number of dimensions
Calls: setdiff -> unique -> unlist
Execution halted