New geom_histogram layer alters original geom_histogram layer in ggplot2

When adding a geom_histogram layer to a plot that has a geom_histogram layer, the first histogram gets altered sometimes. For example, the bins change in the first layer. I added an example below. I was working on something that used the bins of the first histogram layer, and if it changes when adding subsequent layers that causes me some problems. If this is meant to happen, is there a way to prevent the first layer from changing?

suppressMessages(library(dplyr))
library(tibble)
library(ggplot2)

dtf <- c(144.8531, 192.7375, 226.3156, 200.2969, 
       211.3438, 215.5562, 199.6437, 190.1531, 
       189.6469, 216.4906) %>% 
enframe()


plot1 <- dtf %>% 
  ggplot() +
  geom_histogram(aes(value), bins = 47) 
plot1


new_dtf <- c(145.2158, 189.4889, 189.4889, 193.0307, 
           200.1144, 200.1144, 210.7399, 216.0527,
           216.0527, 226.6783) %>% 
  enframe()

plot2 <- plot1 +
  geom_histogram(aes(value), data = new_dtf, 
                 alpha = 0.1, fill = "red")
plot2
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.


layer_data(plot1, 1) %>% 
  select(xmin, xmax) %>% 
  head()
#>       xmin     xmax
#> 1 144.3303 146.1012
#> 2 146.1012 147.8721
#> 3 147.8721 149.6431
#> 4 149.6431 151.4140
#> 5 151.4140 153.1849
#> 6 153.1849 154.9558
layer_data(plot2, 1) %>% 
  select(xmin, xmax) %>% 
  head()
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#>       xmin     xmax
#> 1 143.1941 144.9729
#> 2 144.9729 146.7517
#> 3 146.7517 148.5305
#> 4 148.5305 150.3093
#> 5 150.3093 152.0881
#> 6 152.0881 153.8670

Created on 2019-11-13 by the reprex package (v0.3.0)

This is because the bins change. Set your own fixed bins.

Hi @lebryant. You can do a little trick. geom_histogram function will bin your data but geom_col will not. You can do as the code below.

library(tidyverse)

dtf <- c(144.8531, 192.7375, 226.3156, 200.2969, 
         211.3438, 215.5562, 199.6437, 190.1531, 
         189.6469, 216.4906) %>% 
  enframe()


plot1 <- dtf %>% 
  ggplot() +
  geom_histogram(aes(value), bins = 47) 

new_dtf <- c(145.2158, 189.4889, 189.4889, 193.0307, 
             200.1144, 200.1144, 210.7399, 216.0527,
             216.0527, 226.6783) %>% 
  enframe()

plot2 <- ggplot() +
  geom_histogram(aes(value), data = new_dtf, 
                 alpha = 0.1, fill = "red", bins = 30)

layer_data(plot1) %>%
  ggplot(aes(x = (xmin + xmax) / 2, y = y)) +
  geom_col(width = 1.7709) +
  geom_col(aes(x = (xmin + xmax) / 2, y = y), alpha = 0.1, fill = "red", width = 2.809, data = layer_data(plot2))

Created on 2019-11-15 by the reprex package (v0.3.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

I don't think ggplot works in layers the same way as base R plots. Usually with ggplot you put all the data into a single data frame and plot it. Adding different geoms will affect how it calculates the bins I think.

I tried setting the breaks in the first layer, but it still changes when a new layer is added. I don't see why the addition of a second histogram should alter the bins of the first histogram. I also tried to set inherit.aes = FALSE in the second histogram hoping to prevent interaction between the layers, but that didn't work either.

Thanks @raytong, the geom_col trick is helpful for what I am doing. I also realized when looking over my examples that setting the breaks for a histogram layer does appear to fix the bins when adding new layers. So, if a user supplies a histogram plot without the breaks specified, you can set the breaks before adding new layers.

suppressMessages(library(dplyr))
library(tibble)
library(ggplot2)

dtf <- c(144.8531, 192.7375, 226.3156, 200.2969, 
       211.3438, 215.5562, 199.6437, 190.1531, 
       189.6469, 216.4906) %>% 
enframe()


plot1 <- dtf %>% 
  ggplot() +
  geom_histogram(aes(value), bins = 47) 
plot1


### add breaks to plot1
breaks <- c(
  layer_data(plot1, 1)$xmin, 
  layer_data(plot1, 1)$xmax
) %>% 
  unique() %>% 
  sort()

plot1$layers[[1]]$stat_params$breaks <- breaks
###

new_dtf <- c(145.2158, 189.4889, 189.4889, 193.0307, 
           200.1144, 200.1144, 210.7399, 216.0527,
           216.0527, 226.6783) %>% 
  enframe()

plot2 <- plot1 +
  geom_histogram(aes(value), data = new_dtf, 
                 alpha = 0.1, fill = "red")
plot2
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.


layer_data(plot1, 1) %>% 
  select(xmin, xmax) %>% 
  head()
#>       xmin     xmax
#> 1 144.3303 146.1012
#> 2 146.1012 147.8721
#> 3 147.8721 149.6431
#> 4 149.6431 151.4140
#> 5 151.4140 153.1849
#> 6 153.1849 153.1849

layer_data(plot2, 1) %>% 
  select(xmin, xmax) %>% 
  head()
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#>       xmin     xmax
#> 1 144.3303 146.1012
#> 2 146.1012 147.8721
#> 3 147.8721 149.6431
#> 4 149.6431 151.4140
#> 5 151.4140 153.1849
#> 6 153.1849 153.1849

Created on 2019-11-15 by the reprex package (v0.3.0)

1 Like