I am attempting to generate a summary table from a large dataframe. My dataframe has dates associated with each record. I want to include max and min values of certain numeric fields and the date associated with the min and max value.
Reprex
library(dplyr)
# ----- Dates -----
start_date <- as.Date('2020-01-01')
end_date <- as.Date('2020-10-01')
set.seed(19)
dates <- as.Date(sample( as.numeric(start_date): as.numeric(end_date), 10,
replace = F),
origin = '1970-01-01')
# ----- Flow and temp -----
set.seed(19)
flow = sample(10:100, 10)
set.seed(19)
temp = sample(5:27, 10)
# ----- River -----
set.seed(19)
river = sample(c("CO", "GR"), 10, replace = TRUE)
# ----- Create tibble -----
water_data <- tibble(river, dates, flow, temp)
Summary table
summary_tbl <- water_data %>%
group_by(river) %>%
summarise(max_flow = max(flow, na.rm = TRUE),
max_temp = max(temp, na.rm = TRUE),
.groups = "drop")
Now I want to create 2 additional fields within the summarise()
call: date_max_flow
and date_max_temp
each capturing the date associated with that max value. While I could determine the min and max values and then subset the original dataframe by them, my dataframe is big this would probably get confusing. It seems like there ought to be a way to do this within the call to summarise.
Thanks