I aim to use dplyr to, first, group certain events per ID. And, then, I would like to be able to choose two possible startevents and discard all the events prior to these startevents in every ID group. I have tried to make an example:
Data:
id <- c(1, 1, 1, 1, 2, 2, 2, 2)
timeorder <- c(1, 2, 3, 4, 1, 2, 3, 4)
events1 <- c("a", "b", "a", "b", "a", "a", "a", "b")
events2 <- c("x", "x", "x", "x", "x", "y", "x", "y")
testdata <- data.frame(id, timeorder, events1, events2)
What I am aiming for:
Let's decide on rule: startevent b or startevent y
Then, for ID 1 the results should be:
events1: b, a, b
events2: x, x, x
In other words: disregard all events prior to the first b.
And, for ID 2 the results should be:
events1: a, a, b
events2: y, x, y
In other words: since event y became prior to event b, the filtering was done on event y. All events prior to the first event y were deleted.
I have tried several filter options, among which:
library(dplyr)
filtertest <- testdata %>%
group_by(id, timeorder) %>%
filter(events1 != max("b") | events2 != max("y"))
Which, unfortunately, does not give me the result I am aiming for. Maybe I need a while statement somewhere? I cannot figure this out.
I hope I have made my question clear and would appreciate your help a lot! Thank you!