I am trying to count the first occurrence of a disease (say myocardial infarction (MI) "heart attack") but I am having a hard time implementing this in R (base or tidyverse). Any help is appreciated.
n_id <- 5 # five individuals
n_time <- 4 # four time pints
id <- rep(1:n_id, each = n_time)
time <- rep(1:n_time,times = n_id)
MI <- c(0,0,1,1,
0,1,1,1,
0,0,0,1,
0,0,0,0,
0,0,0,0)
dsn <- data.frame(id, time, MI)
dsn
#> id time MI
#> 1 1 1 0
#> 2 1 2 0
#> 3 1 3 1
#> 4 1 4 1
#> 5 2 1 0
#> 6 2 2 1
#> 7 2 3 1
#> 8 2 4 1
#> 9 3 1 0
#> 10 3 2 0
#> 11 3 3 0
#> 12 3 4 1
#> 13 4 1 0
#> 14 4 2 0
#> 15 4 3 0
#> 16 4 4 0
#> 17 5 1 0
#> 18 5 2 0
#> 19 5 3 0
#> 20 5 4 0
#I want to count the first occurrence of MI but I am not sure how to do so
#Perhaps, I first need to set the second and later occurrences to missing
#and then count, but this seems inefficient
MI2 <- c(0,0,1,NA,
0,1,NA,NA,
0,0,0,1,
0,0,0,0,
0,0,0,0)
dsn2 <- data.frame(id, time, MI, MI2)
dsn2
#> id time MI MI2
#> 1 1 1 0 0
#> 2 1 2 0 0
#> 3 1 3 1 1
#> 4 1 4 1 NA
#> 5 2 1 0 0
#> 6 2 2 1 1
#> 7 2 3 1 NA
#> 8 2 4 1 NA
#> 9 3 1 0 0
#> 10 3 2 0 0
#> 11 3 3 0 0
#> 12 3 4 1 1
#> 13 4 1 0 0
#> 14 4 2 0 0
#> 15 4 3 0 0
#> 16 4 4 0 0
#> 17 5 1 0 0
#> 18 5 2 0 0
#> 19 5 3 0 0
#> 20 5 4 0 0
incidence <- sum(dsn2$MI2, na.rm = TRUE)/n_id
incidence
#> [1] 0.6
Only three people out of 5 were diagnosed with MI and so the incidence is = 3/5 =0.6
#I am trying to find a way to get to the 0.6 without having to count manually
#Any help is appreciated