reshaping a dataframe in order to simplify plotting

Andrea · March 6, 2020, 3:57pm

This is probably very simple, but I haven't been using my R muscles in quite some time, and I may be missing something obvious. I trained two forecasting models, and for each model I got fitted values, lower and upper prediction bounds, from day 1 to today, and then extrapolated to the future. I can easily plot the forecast from one model:

library(ggplot2)
day <- c(1:7,7:30)
dframe <- data.frame(day = day,
                     future = seq_along(day) > 7,
                     m1_lwr = day^2, m1_fit = 1.2*day^2, m1_upr = 1.4*day^2, 
                     m2_lwr = day^1.8, m2_fit = 1.2*day^1.8, m2_upr = 1.4*day^1.8)

ggplot(dframe, aes(x = day, color = future)) +
  geom_line(aes(y = m1_fit)) +
  geom_line(aes(y = m1_lwr), linetype = "dashed") +
  geom_line(aes(y = m1_upr), linetype = "dashed") +
  labs(title=paste("model 1 forecast"))

^{Created on 2020-03-06 by the reprex package (v0.3.0)}

Now, I would like to make a plot with two facets, same y axis, where one facet contains the results of model_1, the other one contains the results of model_2, and such that data are still colored by the future variable. I recall this could be done with facet_wrap, but IIRC I should first reshape my data so that:

the last three columns are appended to the three columns before them
all the other columns are duplicated
a new column is added, containing something like model 1 for the first 31 rows and model 2 for the other 31.

How do I do that?

siddharthprabhu · March 6, 2020, 5:37pm

Thank you for the fabulous reprex @Andrea!

Getting your data into a tidy format with the model id as a variable should do the trick.

library(tidyr)
library(ggplot2)

day <- c(1:7,7:30)

dframe <- data.frame(day = day,
                     future = seq_along(day) > 7,
                     m1_lwr = day^2, m1_fit = 1.2*day^2, m1_upr = 1.4*day^2, 
                     m2_lwr = day^1.8, m2_fit = 1.2*day^1.8, m2_upr = 1.4*day^1.8)

dframe_pivoted <- dframe %>% 
  pivot_longer(cols = c(-day, -future)) %>% 
  separate(name, into = c("model", "statistic")) %>% 
  pivot_wider(names_from = statistic, values_from = value)

print(dframe_pivoted)
#> # A tibble: 62 x 6
#>      day future model   lwr   fit   upr
#>    <int> <lgl>  <chr> <dbl> <dbl> <dbl>
#>  1     1 FALSE  m1     1     1.2   1.4 
#>  2     1 FALSE  m2     1     1.2   1.4 
#>  3     2 FALSE  m1     4     4.8   5.6 
#>  4     2 FALSE  m2     3.48  4.18  4.88
#>  5     3 FALSE  m1     9    10.8  12.6 
#>  6     3 FALSE  m2     7.22  8.67 10.1 
#>  7     4 FALSE  m1    16    19.2  22.4 
#>  8     4 FALSE  m2    12.1  14.6  17.0 
#>  9     5 FALSE  m1    25    30    35   
#> 10     5 FALSE  m2    18.1  21.7  25.4 
#> # ... with 52 more rows

ggplot(dframe_pivoted, aes(x = day, color = future)) +
  geom_line(aes(y = fit)) +
  geom_line(aes(y = lwr), linetype = "dashed") +
  geom_line(aes(y = upr), linetype = "dashed") +
  facet_wrap(~ model, labeller = "label_both")

^{Created on 2020-03-06 by the reprex package (v0.3.0)}

Andrea · March 6, 2020, 5:51pm

Thanks for the amazing solution!

Andrea · March 13, 2020, 5:51pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.