Gglot for time series of many, many lines?

I am trying to make a graph showing the time series progression of rainfall for 1000 or so regions. My current plot looks like this:
image

As you can see, this is way too many lines to have on a plot. Ideally, I would like to plot the progression of deciles across time, so that we have only 9 lines on the graph.

I'm not 100% what the code for this would look like. I need a function that goes from N -> 10 and is applied across a dataframe that is grouped by year.

Instead of worrying about this hypothetical function, I thought it would be best to first ask the community if there is a ggplot2 function that does this for me. Most time-series graphs focus on having only one or two lines. Of the responses that focus on having many different observations per year, people suggest two thing.

  1. Do what I do above and set both size and alpha to be very low. This results in pretty graphs, but it is difficult for me to get a true sense of the distribution of rainfall.
  2. Plot a series of box and whisker plots across time. This seems like an inelegant solution, since it can get crowded with many years.
  3. A ridge plot, but this also gets awkward with many years of data.
1 Like

Here is my solution, using do() with group_by

  1. Make a function that returns a dataframe with 10 rows and 21 columns.
get_quantiles <- function(x) {
	values <- x$rainfall_per_square_meter
	t = quantile(values, seq(.1,.9,.1))
	df <- data.frame(quantile_value = t, quantile = seq(.1,.9,.1))
	return(df)
}
  1. Use group_by with do() to make an output dataframe that has get_quantiles mapped across years and appended.
df2 <- df %>% group_by(year) %>%
do(get_quantiles(.))

Next steps are to make my get_quantiles function more extensible to use different variables and sets of quantiles.

My new plot now looks like this:

1 Like

Here's another option for calculating and plotting quantiles.

library(tidyverse)

# Fake data
set.seed(2)
d = replicate(1000, data.frame(year=2000:2018, value=cumsum(rnorm(19))), simplify=FALSE) %>% 
  bind_rows(.id="location")

# Summarise to get quantiles by year
prob=seq(0,1,0.1)
ds = d %>% group_by(year) %>% 
  summarise(lab = list(paste0(prob*100, "%")), 
            q = list(quantile(value, prob))) %>% 
  unnest 

ggplot(ds, aes(year, q, colour=lab)) +
  geom_line() +
  geom_text(data=ds %>% filter(year==max(year)), aes(label=lab, x=max(ds$year)), 
            position=position_nudge(x=0.2), hjust=0, size=3) +
  theme_classic() +
  guides(colour=FALSE) +
  expand_limits(x=2018.6)

Rplot08

1 Like