Thank you. This works great. I realize I wasn't clear in my example. These methods work great in general but I wanted a way to get the incidence and prevalence by time period. The incidence is the proportion of new cases that occur at a particular time divided by the number of people who did not get the disease
n_id <- 5 # five individuals
n_time <- 4 # four time pints
id <- rep(1:n_id, each = n_time)
time <- rep(1:n_time,times = n_id)
MI <- c(0,0,1,1,
0,1,1,1,
0,0,0,1,
0,0,0,0,
0,0,0,0)
dsn <- data.frame(id, time, MI)
MI2 <- c(0,0,1,NA,
0,1,NA,NA,
0,0,0,1,
0,0,0,0,
0,0,0,0)
dsn2 <- data.frame(id, time, MI, MI2)
library(dplyr)
arrange(dsn2, time)
dsn2
#> id time MI MI2
#> 1 1 1 0 0
#> 2 2 1 0 0
#> 3 3 1 0 0
#> 4 4 1 0 0
#> 5 5 1 0 0
#> 6 1 2 0 0
#> 7 2 2 1 1
#> 8 3 2 0 0
#> 9 4 2 0 0
#> 10 5 2 0 0
#> 11 1 3 1 1
#> 12 2 3 1 NA
#> 13 3 3 0 0
#> 14 4 3 0 0
#> 15 5 3 0 0
#> 16 1 4 1 NA
#> 17 2 4 1 NA
#> 18 3 4 1 1
#> 19 4 4 0 0
#> 20 5 4 0 0
#in the example above, it can be calculated as below
#For the incidence at each time point (proportion of new cases that occur at a particular time divided by the number of people who did not get the disease)
#time 1 = 0/5 =0
#time 2 = 1/5 =0.2
#time 3 = 1/4 =0.25
#time 4 = 1/3 =0.33
##For the prevalence at each time point (the proportion of new and old cases divided by total population)
#time 1 = 0/5 =0
#time 2 = 1/5 =0.2
#time 3 = 2/5 =0.4
#time 4 = 3/5 =0.6
time <- 1:4
incidence <- c(0/5, 1/5, 1/4, 1/3)
prevalence <- c(0/5, 1/5, 2/5, 3/5)
results <- cbind(time, incidence, prevalence)
results
#> time incidence prevalence
#> [1,] 1 0.0000000 0.0
#> [2,] 2 0.2000000 0.2
#> [3,] 3 0.2500000 0.4
#> [4,] 4 0.3333333 0.6
I'd' like to be able to do this for each time point and accounting for what happened at the previous time point. Would a for loop be the way to go? Thank you so much