Hello,
I have a large time-series data set, which includes person level information. The columns important to my question include: ID, Dosage, Date and row_nbr. What I need to do is delete all rows leading up to the first row where Dosage > 0. My data looks like:
df<-data.frame(ID=rep(c(1999,1851),each=66),Dosage=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,15,15,0,0,20,20,20,20,20,0,0,0,0,10,10,10,10,10,10,10,10,10,10,10,0,0,20,20,20,20,20,20,20,20,0,0,35,35,35,35,35,35,35,35,35,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,15,15,15,15,15,15,0,0,20,20,20,20,20,0,0,0,0,10,10,10,10,10,10,10,10,10,10,10,0,0,20,20,20,20,20,20,20,20,0,0,35,35,35,35,35,35,35,35,35),Date=seq(as.Date('2014-01-01'),length.out=66,by='month'),row_nbr=seq(66))
I received help on how to do this when there is only one ID in the data set. The code is:
i<-first(which(Dosage>0) df<-tail(df,-i+1)
This works great when there isn't multiple accounts, but when I apply this to my actual data set where I have more than one ID: I used this code: df2<-df%>%group_by(ID)%>%mutate(i=first(which(Dosage>0)))
but I get this error: Error: Column
i must be length 2 (the group size) or one, not 0
.
Is there a work around where I can use this code with a group_by statement? I was thinking I could potentially split the data, but wanted to see what others had to say.