how to get the date for the first time a condition is meet?

ignacio · November 14, 2019, 3:57pm

My data has 3 columns: an id, a date, and a number. I want to create a dataframe with two columns: an id, and the date a condition is meet for the first time for each id. For example, the first time that the number is greater than 0 for each id. That is:

library(lubridate)

my_data <- data.frame(
  id = c(rep('A',3), rep('B',3)),
  date = c(
    make_datetime(year = 2018, month = 9),
    make_datetime(year = 2018, month = 12),
    make_datetime(year = 2019, month = 1),
    make_datetime(year = 2018, month = 8),
    make_datetime(year = 2018, month = 10),
    make_datetime(year = 2018, month = 11)
  ),
  number = c(0,2,3,0,3,4)
)

my_data
#>   id       date number
#> 1  A 2018-09-01      0
#> 2  A 2018-12-01      2
#> 3  A 2019-01-01      3
#> 4  B 2018-08-01      0
#> 5  B 2018-10-01      3
#> 6  B 2018-11-01      4

answer <- data.frame(id = c('A', 'B'),
                     first = c(
                       make_datetime(year = 2018, month = 12),
                       make_datetime(year = 2018, month = 10)
                     ))

answer
#>   id      first
#> 1  A 2018-12-01
#> 2  B 2018-10-01

^{Created on 2019-11-14 by the reprex package (v0.3.0)}

What is the tidy way of doing this? This is my best solution but I'm wondering if there is a cleaner way:

library(dplyr)

  my_data %>% 
    mutate(id = as.character(id)) %>% 
    filter(number > 0) %>% 
    group_by(id) %>% 
    filter(date == min(date)) %>% 
    select(id, first=date)

# A tibble: 2 x 2
# Groups:   id [2]
  id    first              
  <chr> <dttm>             
1 A     2018-12-01 00:00:00
2 B     2018-10-01 00:00:00

FJCC · November 14, 2019, 4:13pm

I would use the summarize() function.

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
library(dplyr)
my_data <- data.frame(
  id = c(rep('A',3), rep('B',3)),
  date = c(
    make_datetime(year = 2018, month = 9),
    make_datetime(year = 2018, month = 12),
    make_datetime(year = 2019, month = 1),
    make_datetime(year = 2018, month = 8),
    make_datetime(year = 2018, month = 10),
    make_datetime(year = 2018, month = 11)
  ),
  number = c(0,2,3,0,3,4)
)

Answer <- my_data %>% filter(number > 0) %>% 
  group_by(id) %>% 
  summarize(First = min(date))
Answer
#> # A tibble: 2 x 2
#>   id    First              
#>   <fct> <dttm>             
#> 1 A     2018-12-01 00:00:00
#> 2 B     2018-10-01 00:00:00

^{Created on 2019-11-14 by the reprex package (v0.3.0.9000)}

system · November 21, 2019, 4:21pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.