Applying a Which Statement with a Group_By Statement

soufin12 · April 16, 2020, 5:26pm

Hello,

I have a large time-series data set, which includes person level information. The columns important to my question include: ID, Dosage, Date and row_nbr. What I need to do is delete all rows leading up to the first row where Dosage > 0. My data looks like:

df<-data.frame(ID=rep(c(1999,1851),each=66),Dosage=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,15,15,0,0,20,20,20,20,20,0,0,0,0,10,10,10,10,10,10,10,10,10,10,10,0,0,20,20,20,20,20,20,20,20,0,0,35,35,35,35,35,35,35,35,35,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,15,15,15,15,15,15,0,0,20,20,20,20,20,0,0,0,0,10,10,10,10,10,10,10,10,10,10,10,0,0,20,20,20,20,20,20,20,20,0,0,35,35,35,35,35,35,35,35,35),Date=seq(as.Date('2014-01-01'),length.out=66,by='month'),row_nbr=seq(66))

I received help on how to do this when there is only one ID in the data set. The code is:
i<-first(which(Dosage>0) df<-tail(df,-i+1)

This works great when there isn't multiple accounts, but when I apply this to my actual data set where I have more than one ID: I used this code: df2<-df%>%group_by(ID)%>%mutate(i=first(which(Dosage>0))) but I get this error: Error: Column i must be length 2 (the group size) or one, not 0.

Is there a work around where I can use this code with a group_by statement? I was thinking I could potentially split the data, but wanted to see what others had to say.

soufin12 · April 16, 2020, 5:49pm

I tried df<-df%>%group_by(ID)%>%mutate(i=which.min(Dosage > 0)) and it just returns 1 for each row.

Yarnabrina · April 16, 2020, 6:13pm

Please ignore that suggestion.I forgot that FALSE < TRUE.

I think this may be what you are after.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df <- tibble(ID = rep(x = c(1999, 1851),
                      each = 66),
             Dosage = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 15, 15, 0, 0, 20, 20, 20, 20, 20, 0, 0, 0, 0, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 0, 0, 20, 20, 20, 20, 20, 20, 20, 20, 0, 0, 35, 35, 35, 35, 35, 35, 35, 35, 35, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 15, 15, 15, 15, 15, 15, 0, 0, 20, 20, 20, 20, 20, 0, 0, 0, 0, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 0, 0, 20, 20, 20, 20, 20, 20, 20, 20, 0, 0, 35, 35, 35, 35, 35, 35, 35, 35, 35),
             Date = rep(x = seq(from = as.Date(x = "2014-01-01"),
                                length.out = 66,
                                by = "month"),
                        times = 2))

df %>%
    group_by(ID) %>%
    mutate(i = min(which(x = Dosage > 0))) %>%
    filter(row_number() >= i) %>%
    select(-i) %>%
    ungroup() %>%
    as.data.frame()
#>      ID Dosage       Date
#> 1  1999     15 2015-09-01
#> 2  1999     15 2015-10-01
#> 3  1999     15 2015-11-01
#> 4  1999      0 2015-12-01
#> 5  1999      0 2016-01-01
#> 6  1999     20 2016-02-01
#> 7  1999     20 2016-03-01
#> 8  1999     20 2016-04-01
#> 9  1999     20 2016-05-01
#> 10 1999     20 2016-06-01
#> 11 1999      0 2016-07-01
#> 12 1999      0 2016-08-01
#> 13 1999      0 2016-09-01
#> 14 1999      0 2016-10-01
#> 15 1999     10 2016-11-01
#> 16 1999     10 2016-12-01
#> 17 1999     10 2017-01-01
#> 18 1999     10 2017-02-01
#> 19 1999     10 2017-03-01
#> 20 1999     10 2017-04-01
#> 21 1999     10 2017-05-01
#> 22 1999     10 2017-06-01
#> 23 1999     10 2017-07-01
#> 24 1999     10 2017-08-01
#> 25 1999     10 2017-09-01
#> 26 1999      0 2017-10-01
#> 27 1999      0 2017-11-01
#> 28 1999     20 2017-12-01
#> 29 1999     20 2018-01-01
#> 30 1999     20 2018-02-01
#> 31 1999     20 2018-03-01
#> 32 1999     20 2018-04-01
#> 33 1999     20 2018-05-01
#> 34 1999     20 2018-06-01
#> 35 1999     20 2018-07-01
#> 36 1999      0 2018-08-01
#> 37 1999      0 2018-09-01
#> 38 1999     35 2018-10-01
#> 39 1999     35 2018-11-01
#> 40 1999     35 2018-12-01
#> 41 1999     35 2019-01-01
#> 42 1999     35 2019-02-01
#> 43 1999     35 2019-03-01
#> 44 1999     35 2019-04-01
#> 45 1999     35 2019-05-01
#> 46 1999     35 2019-06-01
#> 47 1851     15 2015-05-01
#> 48 1851     15 2015-06-01
#> 49 1851     15 2015-07-01
#> 50 1851     15 2015-08-01
#> 51 1851     15 2015-09-01
#> 52 1851     15 2015-10-01
#> 53 1851     15 2015-11-01
#> 54 1851      0 2015-12-01
#> 55 1851      0 2016-01-01
#> 56 1851     20 2016-02-01
#> 57 1851     20 2016-03-01
#> 58 1851     20 2016-04-01
#> 59 1851     20 2016-05-01
#> 60 1851     20 2016-06-01
#> 61 1851      0 2016-07-01
#> 62 1851      0 2016-08-01
#> 63 1851      0 2016-09-01
#> 64 1851      0 2016-10-01
#> 65 1851     10 2016-11-01
#> 66 1851     10 2016-12-01
#> 67 1851     10 2017-01-01
#> 68 1851     10 2017-02-01
#> 69 1851     10 2017-03-01
#> 70 1851     10 2017-04-01
#> 71 1851     10 2017-05-01
#> 72 1851     10 2017-06-01
#> 73 1851     10 2017-07-01
#> 74 1851     10 2017-08-01
#> 75 1851     10 2017-09-01
#> 76 1851      0 2017-10-01
#> 77 1851      0 2017-11-01
#> 78 1851     20 2017-12-01
#> 79 1851     20 2018-01-01
#> 80 1851     20 2018-02-01
#> 81 1851     20 2018-03-01
#> 82 1851     20 2018-04-01
#> 83 1851     20 2018-05-01
#> 84 1851     20 2018-06-01
#> 85 1851     20 2018-07-01
#> 86 1851      0 2018-08-01
#> 87 1851      0 2018-09-01
#> 88 1851     35 2018-10-01
#> 89 1851     35 2018-11-01
#> 90 1851     35 2018-12-01
#> 91 1851     35 2019-01-01
#> 92 1851     35 2019-02-01
#> 93 1851     35 2019-03-01
#> 94 1851     35 2019-04-01
#> 95 1851     35 2019-05-01
#> 96 1851     35 2019-06-01

^{Created on 2020-04-16 by the reprex package (v0.3.0)}

soufin12 · April 16, 2020, 6:18pm

This works perfectly, thanks for your help!

system · April 23, 2020, 6:18pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.