Why do results of stat_summary() and stat_summary_bin() differ?

Hello experts. I just noticed that plots produced by stat_summary() and stat_summary_bin() show different y coordinates even with the same data.

library(tidyverse)

set.seed(1)
sample_data = tibble(x=rnorm(1000), y=x+rnorm(1000))
sample_data %<>% mutate(x_bin = cut(x, breaks=10))

ggplot(sample_data, mapping=aes(x=x, y=y)) + 
  stat_summary_bin(geom="bar", fun=mean, bins=9)

image

ggplot(sample_data, mapping=aes(x=x_bin, y=y)) + 
  stat_summary(geom="bar", fun=mean)

image
As shown, corresponding bars in those two plots show different heights (compare the 9th and 10th bars), even though the same data were used and the x axis was divided into 10 bins with an equal range in both cases. Why?

the answer has to be that stat_summary_bin doesnt use cut breaks in the way that you did to do its binning.
you can put your plots into plotly to mouseover and look at the bars.

library(plotly)
g1<- ggplot(sample_data, mapping=aes(x=x, y=y)) + 
  stat_summary_bin(geom="bar", fun=mean, bins=9)

g2 <- ggplot(sample_data, mapping=aes(x=x_bin, y=y)) + 
  stat_summary(geom="bar", fun=mean)

ggplotly(g1)
ggplotly(g2)

image
image

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.