I am trying to select the most recent date in a grouped column, subject to constraints imposed by an third column. Everything works fine when there are no NAs involved, but when the condition column contains NAs unexpected results obtain.
I would like to understand why the results of these two queries differ and how I might fix my query to handle the case when NAs exist.
So, how can I change the query on dat2 so that it's results match those of the query on dat?
dat <- tibble(x = rep(c('a', 'b'), each = 4),
y = rep(seq(as.Date('2019-02-01'), as.Date('2019-02-04'), by = 'days'), 2),
z = c('this', 'that', 'this', 'that', 'this', 'that', 'this', 'that'))
dat2 <- dat %>% mutate(z = case_when(x == 'b' & y == as.Date('2019-02-04') ~ NA_character_,
TRUE ~ z))
dat %>% group_by(x) %>%
arrange(y) %>%
summarize(first = first(y[z == 'this']),
last = last(y[z == 'this']))
# A tibble: 2 x 3
x first last
<chr> <date> <date>
1 a 2019-02-01 2019-02-03
2 b 2019-02-01 2019-02-03
dat2 %>% group_by(x) %>%
arrange(y) %>%
summarize(first = first(y[z == 'this']),
last = last(y[z == 'this']))
# A tibble: 2 x 3
x first last
<chr> <date> <date>
1 a 2019-02-01 2019-02-03
2 b 2019-02-01 NA