My data has 3 columns: an id, a date, and a number. I want to create a dataframe with two columns: an id, and the date a condition is meet for the first time for each id. For example, the first time that the number is greater than 0 for each id. That is:
library(lubridate)
my_data <- data.frame(
id = c(rep('A',3), rep('B',3)),
date = c(
make_datetime(year = 2018, month = 9),
make_datetime(year = 2018, month = 12),
make_datetime(year = 2019, month = 1),
make_datetime(year = 2018, month = 8),
make_datetime(year = 2018, month = 10),
make_datetime(year = 2018, month = 11)
),
number = c(0,2,3,0,3,4)
)
my_data
#> id date number
#> 1 A 2018-09-01 0
#> 2 A 2018-12-01 2
#> 3 A 2019-01-01 3
#> 4 B 2018-08-01 0
#> 5 B 2018-10-01 3
#> 6 B 2018-11-01 4
answer <- data.frame(id = c('A', 'B'),
first = c(
make_datetime(year = 2018, month = 12),
make_datetime(year = 2018, month = 10)
))
answer
#> id first
#> 1 A 2018-12-01
#> 2 B 2018-10-01
Created on 2019-11-14 by the reprex package (v0.3.0)
What is the tidy way of doing this? This is my best solution but I'm wondering if there is a cleaner way:
library(dplyr)
my_data %>%
mutate(id = as.character(id)) %>%
filter(number > 0) %>%
group_by(id) %>%
filter(date == min(date)) %>%
select(id, first=date)
# A tibble: 2 x 2
# Groups: id [2]
id first
<chr> <dttm>
1 A 2018-12-01 00:00:00
2 B 2018-10-01 00:00:00