I have a custom function that creates new rows, where it copies the data from row one and adds rows equal to a number in a specific column. Right now, the function works well if there is only one data entry per id. What I need is for the function to work when the data has multiple rows for one id.
My data includes id which is the persons id; Stage which is the stage the person is in; Start/ End which is the Start and End date; MonthDiff which is the difference between the start and end date, and a Censor which is equal to 0 or 1.
I need the function to be grouped by Stage and to copy rows down equal to the month diff in that stage and then restart.
What I have so far:
df<-data.frame(id=c('A','A','A'),
Stage=c(1,2,3),
Start=c(as.Date('2014-01-01'),as.Date('2016-01-01'),as.Date('2019-01-01')),
End=c(as.Date('2015-12-31'),as.Date('2018-12-31'),as.Date('2020-02-01')),
MonthDiff=c(23,35,13),
Censor=c(0,0,1))
PLPP <- function(data, id,Stage, period, event)
{stopifnot(is.matrix(data) || is.data.frame(data))
stopifnot(c(id, period, event) %in% c(colnames(data), 1:ncol(data)))
if (any(is.na(data[, c(id, period, event)]))) {
stop("PLPP cannot currently handle missing data in the id, period, or event variables")
}
period = {
index <- rep(1:nrow(data), data[, period])
idmax<-cumsum(data[, period])
reve <- !data[, event]
dat <- data[index, ]
dat[, period] <- ave(dat[, period], dat[, id], FUN = seq_along)
dat[, event] <- 0
dat[idmax, event] <- reve}
rownames(dat) <- NULL
return(dat)
}
tpp<-PLPP(df,id='id',Stage = 'Stage',period = 'MonthDiff',event = 'Censor')
test<-df%>%group_by(Stage)%>%do(tpp)
My problem with the current code is that the group_by statement isn't restarting at the new Stage.