ggplot: group and color - how to group correctly?

Hi everybody,
I am writing to you for a suggestion, to learn the "best" method.

I have the following dataframe and I want to plot two variables (med_tmin, med_tmax) simultaneously. On the x axis there are the months, on the y axis the temperature and I would like a line for each year.
If the variable x were not repeated (every year has the same months) I would proceed using pivot_longer and using "color" in aes to divide med_tmax and med_tmin.

But in the following case it doesn't work. The workaround is to use the dataframe in wide format and map the y into two geom_line, obviously the legend does not appear (maybe you could force it by inserting it in aes) but I think this is not a clean solution.

In these cases, what workaround do you usually use?

Good day, Filippo

library(tidyverse)

set.seed(1)
temp <-tibble(year = rep(2001:2002, each = 24),
       month = rep(month.abb, 4),
       day = sample(1:30, 48, replace = T),
       tmin = sample(0:15, 48, replace = T),
       tmax = sample(20:35, 48, replace = T))

# works fine but legend is not automatically mapped (not in aes)
temp %>% 
  group_by(year, month) %>% 
  summarise(med_tmin = mean(tmin),
            med_tmax = mean(tmax)) %>% 
  ggplot(aes(x = month, group = year)) +
  geom_line(aes(y = med_tmin), color = "navyblue") + 
  geom_line(aes(y = med_tmax), color = "red") +
  geom_smooth(aes(group = 1, y = med_tmin), se = F, color = "yellow") +
  geom_smooth(aes(group = 1, y = med_tmax), se = F, color = "yellow") +
  labs(x= NULL, y = "Temperature (°C)")
  
# long format
temp %>% 
  group_by(year, month) %>% 
  summarise(med_tmin = mean(tmin),
            med_tmax = mean(tmax)) %>% 
  pivot_longer(cols = c(med_tmin, med_tmax)) %>% 
  ggplot(aes(x = month, y = value, color = name, group = year)) +
  geom_line()

Does this get you what you want? The months are not in the correct order and that can be fixed by using the factor function and setting the levels in the correct order.

library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.1.2

set.seed(1)
temp <-tibble(year = rep(2001:2002, each = 12),
              month = rep(month.abb, 2),
              tmin = sample(0:15, 24, replace = T),
              tmax = sample(20:35, 24, replace = T))


temp %>% 
  group_by(year, month) %>% 
  summarise(med_tmin = mean(tmin),
            med_tmax = mean(tmax)) |> 
  pivot_longer(cols = c(med_tmin, med_tmax)) |> 
  ggplot(aes(x = month, y = value, color = name, 
             group = interaction(year,name))) +
  geom_line() + geom_smooth(aes(group = name), se = FALSE, 
                            color = "yellow") 
#> `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Created on 2022-08-23 by the reprex package (v2.0.1)

1 Like

Oh yes, I didn't know group = interaction(xx, yy), you solved.
Thank you so much, Filippo

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.