Geom_smooth lines: now you see me... now you don't

Hi,

I have a problem with geom_smooth lines which are displayed or not depending on the data or theming options.
Disclaimer #1: The sample code I have included below is meaningless per se but allows to illustrate the problem. I am looking for an explanation or general solution to implement rather than for workarounds to display this particularly data.
Disclaimer #2: I am more comfortable with lattice than ggplot2, so, when I have to use ggplot2, I tend to re-use code that I wrote in the past.

Given this, I recently stumbled upon the puzzling situation. The same data (df) is used in 4 different similar scatterplot cases, showing the y vs x variables and using panelling (facet_wrap using the "panel" variable), grouping (also based upon "panel") and color/symbols (based upon the "var1" [numeric or factor] depending on the case), and, optionally, a custom scale_colour_manual theming call. A geom_smooth call is made in all cases using the same settings and assuming inherit.aes = TRUE.

The spline lines do not show in the second panel for case 3 when the color/symbol variable "var1" is coerced to a factor and a scale_colour_manual call is added. The lines show up in case 2 when the default ggplot theme is used or in case 3 when the variable "var2" is used a color/symbol variable.

Given the maturity of the ggplot2 package, this is probably expected behavior, but I am not knowledgeable enough to figure why this would be.

I would really appreciate any explanation on why splines lines do not show up in case 3 and how to modify my code to ensure that spline lines are always shown using my aesthetics group variable.

require(ggplot2)
#> Loading required package: ggplot2
set.seed(123)
df <- data.frame(
  id = rep(1:15, each = 4),
  x = c(c(0.25,0.5,0.75,1), each = 4),
  y = NA,
  panel = 5,
  var1 = 0, 
  var2 = 0
)
df$y <- df$id* df$x + rnorm(30, 0, 0.5) 
df$panel[which(df$id >= 7)] <- 10
df$var1[which(df$id >= 10)] <- 1
df$var2[which(df$id %in% c(5,8,9))] <- 1
# Case 1: symbol/color by df$var1; df$var1 is numeric - spline lines work
ggplot(data = df) + 
  aes_string(x = 'x', 
    y = 'y', 
    group = 'panel', 
    colour = 'var1') +
  geom_point() + 
  geom_smooth(method = 'loess',
    size = 1.5,
    linetype = 'solid',
    se = FALSE,
    na.rm = TRUE,
    span = 2/3) +
  labs(title = 'case 1') +
  facet_wrap('panel', scales = 'fixed', nrow = 2, ncol = 1) +
  scale_colour_gradientn(colors = rainbow(4))

# Case 2: symbol/color by df$var1; df$var1 is factor - spline lines work
df$var1 <- factor(df$var1)
gplot <- ggplot(data = df) + 
  aes_string(x = 'x', 
    y = 'y', 
    group = 'panel', 
    colour = 'var1',
    shape = 'var1')  +
  geom_point() + 
  geom_smooth(method = 'loess',
    size = 1.5,
    linetype = 'solid',
    se = FALSE,
    na.rm = TRUE,
    span = 2/3)+ 
  facet_wrap('panel', scales = 'fixed', nrow = 2, ncol = 1) 
gplot + labs(title = 'case 2')

# Case 3: symbol/color by df$var1; df$var1 is factor + custom scale - spline lines do not work
gplot + labs(title = 'case 3') + scale_colour_manual(values = c('blue', 'red'))

# Case 4: symbol/color by df$var2; df4var2 is factor + custom scale - spline lines do not work
df$var2 <- factor(df$var2)
ggplot(data = df) + 
  aes_string(x = 'x', 
    y = 'y', 
    group = 'panel', 
    colour = 'var2',
    shape = 'var2')  +
  geom_point() + 
  geom_smooth(method = 'loess',
    size = 1.5,
    linetype = 'solid',
    se = FALSE,
    na.rm = TRUE,
    span = 2/3) +
  labs(title = 'case 4') +
  facet_wrap('panel', scales = 'fixed', nrow = 2, ncol = 1)  +  
  scale_colour_manual(values = c('blue', 'red'))

summary(df[which(df$panel==5),])
#>        id            x               y                panel   var1  
#>  Min.   :1.0   Min.   :0.250   Min.   :-0.03024   Min.   :5   0:24  
#>  1st Qu.:2.0   1st Qu.:0.500   1st Qu.: 1.32576   1st Qu.:5   1: 0  
#>  Median :3.5   Median :0.750   Median : 2.70275   Median :5         
#>  Mean   :3.5   Mean   :1.188   Mean   : 4.20399   Mean   :5         
#>  3rd Qu.:5.0   3rd Qu.:1.000   3rd Qu.: 4.37918   3rd Qu.:5         
#>  Max.   :6.0   Max.   :4.000   Max.   :19.76360   Max.   :5         
#>  var2  
#>  0:20  
#>  1: 4  
#>        
#>        
#>        
#> 
summary(df[which(df$panel==10),])
#>        id           x               y               panel    var1   var2  
#>  Min.   : 7   Min.   :0.250   Min.   : 0.9066   Min.   :10   0:12   0:28  
#>  1st Qu.: 9   1st Qu.:0.500   1st Qu.: 4.9026   1st Qu.:10   1:24   1: 8  
#>  Median :11   Median :0.750   Median : 7.9678   Median :10                
#>  Mean   :11   Mean   :1.375   Mean   :15.1581   Mean   :10                
#>  3rd Qu.:13   3rd Qu.:1.000   3rd Qu.:13.8344   3rd Qu.:10                
#>  Max.   :15   Max.   :4.000   Max.   :60.6269   Max.   :10
sessionInfo()
#> R version 3.4.3 (2017-11-30)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Linux Mint 18.1
#> 
#> Matrix products: default
#> BLAS: /usr/lib/libblas/libblas.so.3.6.0
#> LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggplot2_2.2.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_0.12.11     digest_0.6.12    rprojroot_1.2    plyr_1.8.4      
#>  [5] grid_3.4.3       gtable_0.2.0     backports_1.1.1  magrittr_1.5    
#>  [9] evaluate_0.10.1  scales_0.4.1     rlang_0.1.1      stringi_1.1.5   
#> [13] lazyeval_0.2.0   rmarkdown_1.7    labeling_0.3     tools_3.4.3     
#> [17] stringr_1.2.0    munsell_0.4.3    yaml_2.1.14      compiler_3.4.3  
#> [21] colorspace_1.3-2 htmltools_0.3.6  knitr_1.17       tibble_1.3.3

Could you please turn this into a self-contained reprex (short for minimal reproducible example)? These are especially useful with visualization-related questions, since it automatically renders and uploads the images. It will help us help you if we can be sure we're all working with/looking at the same stuff.

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.

Thanks

1 Like

Hi,

The initial post was updated with a "reprex". Given that the problem seems data driven, I kept my data.frame rather than using a small "boring" data source as recommended in the links you sent. I hope this is OK.

Thanks for you interest in my question

1 Like

Notice in case 1, that the upper pane has a red smooth, and the lower is gray that is because the upper only has 1 value (0) that has a defined color (red). The lower pane has two color defined and "does not know" what the appropriate color is, but nothing is stopping it from producing a smooth. This can be observed by changing the data for the lower pane to var1 = 1

library(tidyverse)

df %>% 
  mutate(var1 = ifelse(panel==10, 1, var1)) %>% 
ggplot(aes(x, y, group = factor(panel),  colour = var1)) +
  geom_point() + 
  geom_smooth(method = 'loess',
              size = 1.5,
              se = FALSE,
              na.rm = TRUE,
              span = 2/3) +
  labs(title = 'case 1') +
  facet_wrap('panel', scales = 'fixed', nrow = 2, ncol = 1) +
  scale_colour_gradientn(colors = rainbow(4))

The same thing is actually observed in case 2. If you tweak that code a la

df %>% 
  mutate(var1 = factor(var1),
         panel = factor(panel)) %>% 
  ggplot(aes(x, y, colour = var1, shape = var1))  +
  geom_point() + 
  geom_smooth(size = 1.5, se = FALSE, span = 2/3, aes(group = panel), color = 2) + 
  facet_wrap(~panel,  nrow = 2) -> gplot
gplot + labs(title = 'case 2')

Then you will have black smooths,

then you can adjust both the color of the points and the color of the smooths for case 3

gplot + labs(title = 'case 3') + 
  scale_colour_manual(values = c(4,2)) +
  geom_smooth(size = 1.5, se = FALSE, span = 2/3, aes(group = panel), color = 1)

Hi,

Thanks @wendigo for your reply.

My main problem is not so much about the color used for the spline lines (although this is certainly part of the equation) but what drives the (absence of) drawing of spline lines:

  • case 2 vs case 3: the only difference in code is the use scale_colour_manual. Colors are also discretized in the default ggplot2 theme. Why the use of scale_colour_manual causes troubles?
  • case 3 vs case 4: the only difference in code is in the color/symbol variable which takes 2 values in both panes in case 4 vs 1 value in pane 5 and 2 values in panel 10 for case 3. I realize that var1 and var2 are factors in cases 3 and 4 vs numeric in case 1, but your argument about the "pane has two color defined and “does not know” what the appropriate color is" should apply for both panes in case 4...

So I am still not sure about what is going on.

The big difference in the code you sent was the definition of aesthetics in the geom_smooth calls. I would have assumed that the default behavior of geom_smooth is to inherit aesthetics from the aes_string calls.

Hi @pomchip the reason you are not getting a smooth for case 3 is because you are defining the colors and groups in the "global aesthetic" if you assign the colors to just the geom_point then case 3 will work in the manner that I think you are asking.

df$var1 <- factor(df$var1)
gplot <- ggplot(data = df) + 
  aes_string(x = 'x', 
             y = 'y', 
             group = 'panel')  +
  geom_point(aes(colour = var1,
                 shape = var1)) + 
  geom_smooth(method = 'loess',
              size = 1.5,
              linetype = 'solid',
              se = FALSE,
              na.rm = TRUE,
              span = 2/3)+ 
  facet_wrap('panel', scales = 'fixed', nrow = 2, ncol = 1) 
gplot + labs(title = 'case 2')

gplot + labs(title = 'case 3') + scale_colour_manual(values = c('blue', 'red'))

OK,

I see the code you provided in your last message generates spline lines.
I hate to be a pain, but I still don't understand why this is the case, especially given that:

  1. the aesthetics you define in the geom_point call is identical to the ones set globally in my code (BTW, geom_points and geom_smooth have the same inherits.aes argument which is identically described in the help files; one would naively think that the functions would actually inherit the settings from the global aesthetics by default).
  2. it does not explain why case 4 works and not case 3

Try running str(**plot here**) and taking a look at the differences.

library(ggplot2)
#> Loading required package: ggplot2
set.seed(123)
df <- data.frame(
  id = rep(1:15, each = 4),
  x = c(c(0.25,0.5,0.75,1), each = 4),
  y = NA,
  panel = 5,
  var1 = 0, 
  var2 = 0
)
df$y <- df$id* df$x + rnorm(30, 0, 0.5) 
df$panel[which(df$id >= 7)] <- 10
df$var1[which(df$id >= 10)] <- 1
df$var2[which(df$id %in% c(5,8,9))] <- 1
# Case 1: symbol/color by df$var1; df$var1 is numeric - spline lines work
c1 <- ggplot(data = df) + 
  aes_string(x = 'x', 
             y = 'y', 
             group = 'panel', 
             colour = 'var1') +
  geom_point() + 
  geom_smooth(method = 'loess',
              size = 1.5,
              linetype = 'solid',
              se = FALSE,
              na.rm = TRUE,
              span = 2/3) +
  labs(title = 'case 1') +
  facet_wrap('panel', scales = 'fixed', nrow = 2, ncol = 1) +
  scale_colour_gradientn(colors = rainbow(4))

# Case 2: symbol/color by df$var1; df$var1 is factor - spline lines work
df$var1 <- factor(df$var1)
gplot <- ggplot(data = df) + 
  aes_string(x = 'x', 
             y = 'y', 
             group = 'panel', 
             colour = 'var1',
             shape = 'var1')  +
  geom_point() + 
  geom_smooth(method = 'loess',
              size = 1.5,
              linetype = 'solid',
              se = FALSE,
              na.rm = TRUE,
              span = 2/3)+ 
  facet_wrap('panel', scales = 'fixed', nrow = 2, ncol = 1) 

c2 <- gplot + labs(title = 'case 2')

str(c2)
#> List of 9
#>  $ data       :'data.frame': 60 obs. of  6 variables:
#>   ..$ id   : int [1:60] 1 1 1 1 2 2 2 2 3 3 ...
#>   ..$ x    : num [1:60] 0.25 0.5 0.75 1 4 0.25 0.5 0.75 1 4 ...
#>   ..$ y    : num [1:60] -0.0302 0.3849 1.5294 1.0353 8.0646 ...
#>   ..$ panel: num [1:60] 5 5 5 5 5 5 5 5 5 5 ...
#>   ..$ var1 : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
#>   ..$ var2 : num [1:60] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ layers     :List of 2
#>   ..$ :Classes 'LayerInstance', 'Layer', 'ggproto', 'gg' <ggproto object: Class LayerInstance, Layer, gg>
#>     aes_params: list
#>     compute_aesthetics: function
#>     compute_geom_1: function
#>     compute_geom_2: function
#>     compute_position: function
#>     compute_statistic: function
#>     data: waiver
#>     draw_geom: function
#>     finish_statistics: function
#>     geom: <ggproto object: Class GeomPoint, Geom, gg>
#>         aesthetics: function
#>         default_aes: uneval
#>         draw_group: function
#>         draw_key: function
#>         draw_layer: function
#>         draw_panel: function
#>         extra_params: na.rm
#>         handle_na: function
#>         non_missing_aes: size shape colour
#>         optional_aes: 
#>         parameters: function
#>         required_aes: x y
#>         setup_data: function
#>         use_defaults: function
#>         super:  <ggproto object: Class Geom, gg>
#>     geom_params: list
#>     inherit.aes: TRUE
#>     layer_data: function
#>     map_statistic: function
#>     mapping: NULL
#>     position: <ggproto object: Class PositionIdentity, Position, gg>
#>         compute_layer: function
#>         compute_panel: function
#>         required_aes: 
#>         setup_data: function
#>         setup_params: function
#>         super:  <ggproto object: Class Position, gg>
#>     print: function
#>     show.legend: NA
#>     stat: <ggproto object: Class StatIdentity, Stat, gg>
#>         aesthetics: function
#>         compute_group: function
#>         compute_layer: function
#>         compute_panel: function
#>         default_aes: uneval
#>         extra_params: na.rm
#>         finish_layer: function
#>         non_missing_aes: 
#>         parameters: function
#>         required_aes: 
#>         retransform: TRUE
#>         setup_data: function
#>         setup_params: function
#>         super:  <ggproto object: Class Stat, gg>
#>     stat_params: list
#>     subset: NULL
#>     super:  <ggproto object: Class Layer, gg> 
#>   ..$ :Classes 'LayerInstance', 'Layer', 'ggproto', 'gg' <ggproto object: Class LayerInstance, Layer, gg>
#>     aes_params: list
#>     compute_aesthetics: function
#>     compute_geom_1: function
#>     compute_geom_2: function
#>     compute_position: function
#>     compute_statistic: function
#>     data: waiver
#>     draw_geom: function
#>     finish_statistics: function
#>     geom: <ggproto object: Class GeomSmooth, Geom, gg>
#>         aesthetics: function
#>         default_aes: uneval
#>         draw_group: function
#>         draw_key: function
#>         draw_layer: function
#>         draw_panel: function
#>         extra_params: na.rm
#>         handle_na: function
#>         non_missing_aes: 
#>         optional_aes: ymin ymax
#>         parameters: function
#>         required_aes: x y
#>         setup_data: function
#>         use_defaults: function
#>         super:  <ggproto object: Class Geom, gg>
#>     geom_params: list
#>     inherit.aes: TRUE
#>     layer_data: function
#>     map_statistic: function
#>     mapping: NULL
#>     position: <ggproto object: Class PositionIdentity, Position, gg>
#>         compute_layer: function
#>         compute_panel: function
#>         required_aes: 
#>         setup_data: function
#>         setup_params: function
#>         super:  <ggproto object: Class Position, gg>
#>     print: function
#>     show.legend: NA
#>     stat: <ggproto object: Class StatSmooth, Stat, gg>
#>         aesthetics: function
#>         compute_group: function
#>         compute_layer: function
#>         compute_panel: function
#>         default_aes: uneval
#>         extra_params: na.rm
#>         finish_layer: function
#>         non_missing_aes: 
#>         parameters: function
#>         required_aes: x y
#>         retransform: TRUE
#>         setup_data: function
#>         setup_params: function
#>         super:  <ggproto object: Class Stat, gg>
#>     stat_params: list
#>     subset: NULL
#>     super:  <ggproto object: Class Layer, gg> 
#>  $ scales     :Classes 'ScalesList', 'ggproto', 'gg' <ggproto object: Class ScalesList, gg>
#>     add: function
#>     clone: function
#>     find: function
#>     get_scales: function
#>     has_scale: function
#>     input: function
#>     n: function
#>     non_position_scales: function
#>     scales: list
#>     super:  <ggproto object: Class ScalesList, gg> 
#>  $ mapping    :List of 5
#>   ..$ group : symbol panel
#>   ..$ colour: symbol var1
#>   ..$ shape : symbol var1
#>   ..$ x     : symbol x
#>   ..$ y     : symbol y
#>  $ theme      : list()
#>  $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto', 'gg' <ggproto object: Class CoordCartesian, Coord, gg>
#>     aspect: function
#>     default: TRUE
#>     distance: function
#>     expand: TRUE
#>     is_linear: function
#>     labels: function
#>     limits: list
#>     modify_scales: function
#>     range: function
#>     render_axis_h: function
#>     render_axis_v: function
#>     render_bg: function
#>     render_fg: function
#>     setup_data: function
#>     setup_layout: function
#>     setup_panel_params: function
#>     setup_params: function
#>     transform: function
#>     super:  <ggproto object: Class CoordCartesian, Coord, gg> 
#>  $ facet      :Classes 'FacetWrap', 'Facet', 'ggproto', 'gg' <ggproto object: Class FacetWrap, Facet, gg>
#>     compute_layout: function
#>     draw_back: function
#>     draw_front: function
#>     draw_labels: function
#>     draw_panels: function
#>     finish_data: function
#>     init_scales: function
#>     map_data: function
#>     params: list
#>     setup_data: function
#>     setup_params: function
#>     shrink: TRUE
#>     train_scales: function
#>     vars: function
#>     super:  <ggproto object: Class FacetWrap, Facet, gg> 
#>  $ plot_env   :<environment: R_GlobalEnv> 
#>  $ labels     :List of 6
#>   ..$ title : chr "case 2"
#>   ..$ group : chr "panel"
#>   ..$ colour: chr "var1"
#>   ..$ shape : chr "var1"
#>   ..$ x     : chr "x"
#>   ..$ y     : chr "y"
#>  - attr(*, "class")= chr [1:2] "gg" "ggplot"

# Case 3: symbol/color by df$var1; df$var1 is factor + custom scale - spline lines do not work
c3 <- gplot + labs(title = 'case 3') + scale_colour_manual(values = c('blue', "red"))

str(c3)
#> List of 9
#>  $ data       :'data.frame': 60 obs. of  6 variables:
#>   ..$ id   : int [1:60] 1 1 1 1 2 2 2 2 3 3 ...
#>   ..$ x    : num [1:60] 0.25 0.5 0.75 1 4 0.25 0.5 0.75 1 4 ...
#>   ..$ y    : num [1:60] -0.0302 0.3849 1.5294 1.0353 8.0646 ...
#>   ..$ panel: num [1:60] 5 5 5 5 5 5 5 5 5 5 ...
#>   ..$ var1 : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
#>   ..$ var2 : num [1:60] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ layers     :List of 2
#>   ..$ :Classes 'LayerInstance', 'Layer', 'ggproto', 'gg' <ggproto object: Class LayerInstance, Layer, gg>
#>     aes_params: list
#>     compute_aesthetics: function
#>     compute_geom_1: function
#>     compute_geom_2: function
#>     compute_position: function
#>     compute_statistic: function
#>     data: waiver
#>     draw_geom: function
#>     finish_statistics: function
#>     geom: <ggproto object: Class GeomPoint, Geom, gg>
#>         aesthetics: function
#>         default_aes: uneval
#>         draw_group: function
#>         draw_key: function
#>         draw_layer: function
#>         draw_panel: function
#>         extra_params: na.rm
#>         handle_na: function
#>         non_missing_aes: size shape colour
#>         optional_aes: 
#>         parameters: function
#>         required_aes: x y
#>         setup_data: function
#>         use_defaults: function
#>         super:  <ggproto object: Class Geom, gg>
#>     geom_params: list
#>     inherit.aes: TRUE
#>     layer_data: function
#>     map_statistic: function
#>     mapping: NULL
#>     position: <ggproto object: Class PositionIdentity, Position, gg>
#>         compute_layer: function
#>         compute_panel: function
#>         required_aes: 
#>         setup_data: function
#>         setup_params: function
#>         super:  <ggproto object: Class Position, gg>
#>     print: function
#>     show.legend: NA
#>     stat: <ggproto object: Class StatIdentity, Stat, gg>
#>         aesthetics: function
#>         compute_group: function
#>         compute_layer: function
#>         compute_panel: function
#>         default_aes: uneval
#>         extra_params: na.rm
#>         finish_layer: function
#>         non_missing_aes: 
#>         parameters: function
#>         required_aes: 
#>         retransform: TRUE
#>         setup_data: function
#>         setup_params: function
#>         super:  <ggproto object: Class Stat, gg>
#>     stat_params: list
#>     subset: NULL
#>     super:  <ggproto object: Class Layer, gg> 
#>   ..$ :Classes 'LayerInstance', 'Layer', 'ggproto', 'gg' <ggproto object: Class LayerInstance, Layer, gg>
#>     aes_params: list
#>     compute_aesthetics: function
#>     compute_geom_1: function
#>     compute_geom_2: function
#>     compute_position: function
#>     compute_statistic: function
#>     data: waiver
#>     draw_geom: function
#>     finish_statistics: function
#>     geom: <ggproto object: Class GeomSmooth, Geom, gg>
#>         aesthetics: function
#>         default_aes: uneval
#>         draw_group: function
#>         draw_key: function
#>         draw_layer: function
#>         draw_panel: function
#>         extra_params: na.rm
#>         handle_na: function
#>         non_missing_aes: 
#>         optional_aes: ymin ymax
#>         parameters: function
#>         required_aes: x y
#>         setup_data: function
#>         use_defaults: function
#>         super:  <ggproto object: Class Geom, gg>
#>     geom_params: list
#>     inherit.aes: TRUE
#>     layer_data: function
#>     map_statistic: function
#>     mapping: NULL
#>     position: <ggproto object: Class PositionIdentity, Position, gg>
#>         compute_layer: function
#>         compute_panel: function
#>         required_aes: 
#>         setup_data: function
#>         setup_params: function
#>         super:  <ggproto object: Class Position, gg>
#>     print: function
#>     show.legend: NA
#>     stat: <ggproto object: Class StatSmooth, Stat, gg>
#>         aesthetics: function
#>         compute_group: function
#>         compute_layer: function
#>         compute_panel: function
#>         default_aes: uneval
#>         extra_params: na.rm
#>         finish_layer: function
#>         non_missing_aes: 
#>         parameters: function
#>         required_aes: x y
#>         retransform: TRUE
#>         setup_data: function
#>         setup_params: function
#>         super:  <ggproto object: Class Stat, gg>
#>     stat_params: list
#>     subset: NULL
#>     super:  <ggproto object: Class Layer, gg> 
#>  $ scales     :Classes 'ScalesList', 'ggproto', 'gg' <ggproto object: Class ScalesList, gg>
#>     add: function
#>     clone: function
#>     find: function
#>     get_scales: function
#>     has_scale: function
#>     input: function
#>     n: function
#>     non_position_scales: function
#>     scales: list
#>     super:  <ggproto object: Class ScalesList, gg> 
#>  $ mapping    :List of 5
#>   ..$ group : symbol panel
#>   ..$ colour: symbol var1
#>   ..$ shape : symbol var1
#>   ..$ x     : symbol x
#>   ..$ y     : symbol y
#>  $ theme      : list()
#>  $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto', 'gg' <ggproto object: Class CoordCartesian, Coord, gg>
#>     aspect: function
#>     default: TRUE
#>     distance: function
#>     expand: TRUE
#>     is_linear: function
#>     labels: function
#>     limits: list
#>     modify_scales: function
#>     range: function
#>     render_axis_h: function
#>     render_axis_v: function
#>     render_bg: function
#>     render_fg: function
#>     setup_data: function
#>     setup_layout: function
#>     setup_panel_params: function
#>     setup_params: function
#>     transform: function
#>     super:  <ggproto object: Class CoordCartesian, Coord, gg> 
#>  $ facet      :Classes 'FacetWrap', 'Facet', 'ggproto', 'gg' <ggproto object: Class FacetWrap, Facet, gg>
#>     compute_layout: function
#>     draw_back: function
#>     draw_front: function
#>     draw_labels: function
#>     draw_panels: function
#>     finish_data: function
#>     init_scales: function
#>     map_data: function
#>     params: list
#>     setup_data: function
#>     setup_params: function
#>     shrink: TRUE
#>     train_scales: function
#>     vars: function
#>     super:  <ggproto object: Class FacetWrap, Facet, gg> 
#>  $ plot_env   :<environment: R_GlobalEnv> 
#>  $ labels     :List of 6
#>   ..$ title : chr "case 3"
#>   ..$ group : chr "panel"
#>   ..$ colour: chr "var1"
#>   ..$ shape : chr "var1"
#>   ..$ x     : chr "x"
#>   ..$ y     : chr "y"
#>  - attr(*, "class")= chr [1:2] "gg" "ggplot"
1 Like

@mara: str(c2) and str(3) only differ by the $labels$title slot... so the mystery remains.

So, here are some additional examples to further illustrate the behavior of geom_smooth. These examples use a similar dataset but this one is based upon random x values with different ranges for data associated with var1==0 and var1==1

Case 5 is similar to case 3
Case 6 is similar to case 5 but a group = 'panel' is added to the geom_smooth function call
Case 7 does not use faceting or color/symbol aesthetics and uses group = 'var 1' in the global aesthetics and the geom_smooth function call
Case 8 is similar to case 7 but include aes_string(color='var1') in the geom_smooth function call

  • As expected, case 5 does not generate any smooth line, similar to case 3.
  • In case 6, the plot in pane 10 now shows a (apparently) single spline lines which goes back and forth and changes color. It seems that the spline line first considers the group of data with var1==0 then the group of data with var1==1 and connects the end of the spline line created for the first group to the beginning of the spline created for the second group, before switching to the next color.
  • In case 7, we observe a single smooth line which goes through the data from left to right.
  • In case 8, we observe a single smooth line which goes back and forth and changes color,.

So, I would tend to conclude that:

  • geom_smooth ignores any group variable if no color aesthetic is defined
  • geom_smooth inherits by default the color definition from the global aesthetics but not the group definition. (this seems odd)
  • a redundant definition of the group variable is required in geom_smooth, otherwise no line is created. I am not a ggplot2 expert by any means but this looks and smells like a bug.
require(ggplot2)
#> Loading required package: ggplot2

set.seed(123)

df <- data.frame(
  id = rep(1:15, each = 4),
  x = runif(60, 0, 10),
  y = NA,
  panel = 5,
  var1 = 0, 
  var2 = 0
)

df$y <- df$id* df$x + rnorm(30, 0, 0.5) 
df$panel[which(df$id >= 7)] <- 10
df$var1[which(df$id >= 10)] <- 1
df$var2[which(df$id %in% c(5,8,9))] <- 1
df$x <- df$x - 10*df$var1

df$var1 <- factor(df$var1)

ggplot(data = df) + 
  aes_string(x = 'x', 
    y = 'y', 
    group = 'panel',
    colour = 'var1',
    shape = 'var1')  +
  geom_point() + 
  geom_smooth(method = 'loess',
    size = 1.5,
    linetype = 'solid',
    se = FALSE,
    na.rm = TRUE,
    span = 2/3) + 
  facet_wrap('panel', scales = 'fixed', nrow = 2, ncol = 1) +
  labs(title = 'case 5') + 
  scale_colour_manual(values = c('blue', 'red'))


ggplot(data = df) + 
  aes_string(x = 'x', 
    y = 'y', 
    group = 'panel',
    colour = 'var1',
    shape = 'var1')  +
  geom_point() + 
  geom_smooth(method = 'loess',
    group = 'panel',
    size = 1.5,
    linetype = 'solid',
    se = FALSE,
    na.rm = TRUE,
    span = 2/3) + 
  facet_wrap('panel', scales = 'fixed', nrow = 2, ncol = 1) +
  labs(title = 'case 6') + 
  scale_colour_manual(values = c('blue', 'red'))


ggplot(data = df) + 
  aes_string(x = 'x', 
    y = 'y', 
    group = 'var1') +
  geom_point() + 
  geom_smooth(method = 'loess',
    group = 'var1',
    size = 1.5,
    linetype = 'solid',
    se = FALSE,
    na.rm = TRUE,
    span = 2/3) + 
  labs(title = 'case 7') +
  scale_colour_manual(values = c('blue', 'red'))


ggplot(data = df) + 
  aes_string(x = 'x', 
    y = 'y', 
    group = 'var1') +
  geom_point() + 
  geom_smooth(method = 'loess',
    group = 'var1',
    aes_string(color = 'var1'),
    size = 1.5,
    linetype = 'solid',
    se = FALSE,
    na.rm = TRUE,
    span = 2/3) + 
  labs(title = 'case 8') + 
  scale_colour_manual(values = c('blue', 'red'))

I'm not sure if this is the cause of the problem, but all of the code samples here have the call to aes_string being used as a component in itself, like ggplot() + aes_string() + geom_*(). But it should either be inside ggplot() or inside a geom call.

Global case (all geoms inherit these aesthetics, unless they're overridden or the inheritance is turned off:

ggplot(data, aes_string(x = x, y = y, ...)) +
  geom_point() +
  geom_smooth()

Particular case (only the geoms that receive the call get the aesthetics):

ggplot(data) +
  geom_point(aes_string(x = x, y = y, ...)) +
  geom_smooth()

Maybe that explains this weird behaviour!

It is not very clear to me what the expected behavior should be in all the "test cases", but I think it is possible that you should just drop the grouping on "panel", which I believe is somehow "automatic" since you call facet_wrap. Maybe specifying it leads to trouble because geom_smooth then tries to use somehow both the "color" and the "group" aesthetics to drive separate lines and it gets "confused?

On your original examples:

require(ggplot2)
#> Loading required package: ggplot2
#> Loading required package: ggplot2
set.seed(123)
df <- data.frame(
  id = rep(1:15, each = 4),
  x = c(c(0.25,0.5,0.75,1), each = 4),
  y = NA,
  panel = 5,
  var1 = 0, 
  var2 = 0
)
df$y <- df$id* df$x + rnorm(30, 0, 0.5) 
df$panel[which(df$id >= 7)] <- 10
df$var1[which(df$id >= 10)] <- 1
df$var2[which(df$id %in% c(5,8,9))] <- 1
# Case 1: symbol/color by df$var1; df$var1 is numeric - spline lines work
ggplot(data = df) + 
  aes_string(x = 'x', 
             y = 'y', 
             # group = 'panel', 
             colour = 'var1') +
  geom_point() + 
  geom_smooth(method = 'loess',
              size = 1.5,
              linetype = 'solid',
              se = FALSE,
              na.rm = TRUE,
              span = 2/3) +
  labs(title = 'case 1') +
  facet_wrap('panel', scales = 'fixed', nrow = 2, ncol = 1) +
  scale_colour_gradientn(colors = rainbow(4))


df$var1 <- factor(df$var1)
gplot <- ggplot(data = df) + 
  aes_string(x = 'x', 
             y = 'y', 
             # group = 'panel', 
             colour = 'var1',
             shape = 'var1')  +
  geom_point() + 
  geom_smooth(method = 'loess',
              size = 1.5,
              linetype = 'solid',
              se = FALSE,
              na.rm = TRUE,
              span = 2/3)+ 
  facet_wrap('panel', scales = 'fixed', nrow = 2, ncol = 1) 
gplot + labs(title = 'case 2')


# Case 3: symbol/color by df$var1; df$var1 is factor + custom scale - spline lines do not work
gplot + labs(title = 'case 3') + scale_colour_manual(values = c('blue', 'red'))



# Case 4: symbol/color by df$var2; df4var2 is factor + custom scale - spline lines do not work
df$var2 <- factor(df$var2)
ggplot(data = df) + 
  aes_string(x = 'x', 
             y = 'y', 
             # group = 'panel', 
             colour = 'var2',
             shape = 'var2')  +
  geom_point() + 
  geom_smooth(method = 'loess',
              size = 1.5,
              linetype = 'solid',
              se = FALSE,
              na.rm = TRUE,
              span = 2/3) +
  labs(title = 'case 4') +
  facet_wrap('panel', scales = 'fixed', nrow = 2, ncol = 1)  +  
  scale_colour_manual(values = c('blue', 'red'))

On case 1 of the new ones:

require(ggplot2)
#> Loading required package: ggplot2
#> Loading required package: ggplot2

set.seed(123)

df <- data.frame(
  id = rep(1:15, each = 4),
  x = runif(60, 0, 10),
  y = NA,
  panel = 5,
  var1 = 0, 
  var2 = 0
)

df$y <- df$id* df$x + rnorm(30, 0, 0.5) 
df$panel[which(df$id >= 7)] <- 10
df$var1[which(df$id >= 10)] <- 1
df$var2[which(df$id %in% c(5,8,9))] <- 1
df$x <- df$x - 10*df$var1

df$var1 <- factor(df$var1)

ggplot(data = df) + 
  aes_string(x = 'x', 
             y = 'y', 
             # group = 'panel',
             colour = 'var1',
             shape = 'var1')  +
  geom_point() + 
  geom_smooth(method = 'loess',
              size = 1.5,
              linetype = 'solid',
              se = FALSE,
              na.rm = TRUE,
              span = 2/3) + 
  facet_wrap('panel', scales = 'fixed', nrow = 2, ncol = 1) +
  labs(title = 'case 5') + 
  scale_colour_manual(values = c('blue', 'red'))

Created on 2018-02-28 by the reprex package (v0.2.0).

4 Likes

Hi @rensa,

I will try... but this solution would seem to defeat the purpose and flexibility of the grammar of graphics.

Hi @lbusett,

My example data and code were not intended to produce meaningful plots but to explore the behavior of geom_smooth.
However, your post made me realize that I had this expectation that geom_smooth would behave like geom_line and that the group aesthetics would drive the creation of multiple lines (no matter which color or style) within the same panel when the group variable is used.

Your modified case 2 further confirm that, contrary to geom_line, geom_smooth does not need a group aesthetics to break the data in multiple lines if color (and/or shape?) aesthetics is set.

You might just want to go to the source (the ggplot code is really well-documented internally— i.e. there's lots to read that isn't just the code).
https://github.com/tidyverse/ggplot2/blob/master/R/geom-smooth.r

geom_line() is in geom-path.r

aes_string() in particular has some interesting comments:

1 Like

@pomchip I don't think there's any loss of flexibility; it's just that the global case is perhaps not how you expected it to be laid out.

1 Like

@rensa

Testing shows that defining the aesthetics inside the initial ggplot call (as you suggested) or outside (as done in my original dose) provides the same results.

@mara
Looking at the source code, it appears that geom_line and geom_smooth are just wrapper functions calling some lower level ggproto functions.
Looking one level down the geom_smooth and geom_line does not reveal anything obvious about a difference of operations based upon the aesthetics.

I am tempted to file an issue on github. Last CRAN update of the package dates from more than 1 year ago. Do you think that @hadley is still actively responding to issues or is still maintaining the package?

Fair enough. Glad you were able to rule it out!

Yes the package is still very much maintained, and issues are triaged and responded to. (You can always peep the closed issues if you feel like double checking. :wink:) CRAN-release frequency isn't necessarily a great indicator of care (at least not in this case).

EDIT: Feel free to open an issue. I'm not withholding information, just kinda outlining the steps that are sometimes helpful when investigating.