Hi, I'm looking to find total number of unique combinations of 3 diseases within a group of 20 conditions (factorial). I have code from a reprex that works, and I've made my csv the same shape (diseases begin from column 6 onwards), but it throws and error message when using the real file. I want to find all possible combinations and calculate prevalence of each combination, to then plot as mean and sd. What is the difference between the csv and reprex? (Reprex right at the bottom).
Thanks
Code:
library(tidyverse)
library(utils)
dat <- read_csv("005_trimmed_spice.csv")
dat <- dat[,-c(3,20)]
dat$comorbid <- FALSE
comorbids <- dat[which(rowSums(dat[,7:20]) > 2),1]
dat[comorbids,"comorbid"] <- TRUE
cases <- combn(7:20,3)
dat[,cases[,1]]
make_comb <- function(x) dat[which(rowSums(dat[,cases[,x]]) > 2),1]
show_result <- function(x) dat[dat[make_comb(x)][which(rowSums(dat[,cases[,1]]) > 2),1],]
show_result(1)
show_result(2)
apply(cases, 2, show_result)
Console:
dat <- read_csv("005_trimmed_spice.csv")
New names: 0s
- `` -> ...47
- `` -> ...48
- `` -> ...49
- `` -> ...50
- `` -> ...51
- ...
Rows: 65534 Columns: 86
── Column specification ─────────────────────────────────────────────
Delimiter: ","
chr (1): age_group
dbl (45): UniquePatientID, Age, Sex, CarstairsQuintile, Carstairs...
lgl (40): ...47, ...48, ...49, ...50, ...51, ...52, ...53, ...54,...
Use spec() to retrieve the full column specification for this data.
Specify the column types or set show_col_types = FALSE to quiet this message.
dat <- dat[,-c(3,20)]
dat$comorbid <- FALSE
comorbids <- dat[which(rowSums(dat[,7:20]) > 2),1]
dat[comorbids,"comorbid"] <- TRUE
Error: Must assign to rows with a valid subscript vector.
x Subscript comorbids has the wrong type tbl_df<UniquePatientID:double>.
It must be logical, numeric, or character.
Run rlang::last_error() to see where the error occurred.
cases <- combn(7:20,3)
dat[,cases[,1]]
A tibble: 65,534 x 3
Depression PainfulCondition ActiveAsthma
1 0 0 0
2 0 0 1
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 1 0
10 0 0 0
… with 65,524 more rows
make_comb <- function(x) dat[which(rowSums(dat[,cases[,x]]) > 2),1]
show_result <- function(x) dat[dat[make_comb(x)][which(rowSums(dat[,cases[,1]]) > 2),1],]
show_result(1)
Error: Must subset columns with a valid subscript vector.
x Subscript make_comb(x) has the wrong type tbl_df<UniquePatientID:double>.
It must be logical, numeric, or character.
Run rlang::last_error() to see where the error occurred. >
show_result(2)
Error: Must subset columns with a valid subscript vector.
x Subscript make_comb(x) has the wrong type tbl_df<UniquePatientID:double>.
It must be logical, numeric, or character.
Run rlang::last_error() to see where the error occurred. >
apply(cases, 2, show_result)
Error: Must subset columns with a valid subscript vector.
x Subscript cases[, x] must be a simple vector, not a matrix.
Run rlang::last_error() to see where the error occurred.
Practice reprex where code above worked:
ID =
c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
Age =
c(18, 77, 25, 30, 54, 78, 69, 62, 68, 63),
Sex =
c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
CarsQuintie =
c(2, 1, 3, 1, 1, 5, 1, 1, 5, 1),
age_group =
c("18 - 24", "65 - 74", "25 - 34", "25 - 34", "55 - 64", "75 - 84", "65 - 74", "55 - 64", "55 - 64", "55 - 64"),
CarsQuintie_group =
c(3, 1, 4, 3, 1, 5, 1, 2, 1, 3),
Diabetes =
c(1, 0, 0, 0, 0, 1, 1, 0, 1, 1),
Asthma =
c(1, 1, 0, 0, 0, 1, 1, 0, 1, 0),
Stroke =
c(0, 1, 0, 0, 0, 0, 0, 0, 0, 0),
Heart.attack =
c(1, 1, 0, 0, 0, 1, 1, 0, 1, 1),
COPD =
c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
Hypertension =
c(0, 0, 1, 0, 1, 0, 1, 0, 0, 0),
Eczema =
c(0, 1, 0, 0, 1, 0, 0, 0, 1, 0),
Depression =
c(0, 0, 0, 1, 0, 0, 0, 1, 0, 0))