I am attempting to generate a summary table from a large dataframe. My dataframe has dates associated with each record. I want to include max and min values of certain numeric fields and the date associated with the min and max value.
library(dplyr) # ----- Dates ----- start_date <- as.Date('2020-01-01') end_date <- as.Date('2020-10-01') set.seed(19) dates <- as.Date(sample( as.numeric(start_date): as.numeric(end_date), 10, replace = F), origin = '1970-01-01') # ----- Flow and temp ----- set.seed(19) flow = sample(10:100, 10) set.seed(19) temp = sample(5:27, 10) # ----- River ----- set.seed(19) river = sample(c("CO", "GR"), 10, replace = TRUE) # ----- Create tibble ----- water_data <- tibble(river, dates, flow, temp)
summary_tbl <- water_data %>% group_by(river) %>% summarise(max_flow = max(flow, na.rm = TRUE), max_temp = max(temp, na.rm = TRUE), .groups = "drop")
Now I want to create 2 additional fields within the
date_max_temp each capturing the date associated with that max value. While I could determine the min and max values and then subset the original dataframe by them, my dataframe is big this would probably get confusing. It seems like there ought to be a way to do this within the call to summarise.