help:geom_histogram problem

When I use ggplot geom_histogram to visualize the data,an argument makes me quite confused,"center",I read the documentary but still cannot understand its fucntion.

ggplot(filter(diamonds), aes(x = price)) +
  geom_histogram(binwidth = 100, center = 0)

I can choose 10,100,1000,etc,all the results remain the same

What are you trying to accomplish with center? What do you think it does?

All center does is specify where one of the bin centers should be.

As it shown in the document,center specifies one of the bin centers ,which one? the first one?

Here is an example of shifting the centers of the bins from multiples of 0.5 to (0.5*n) + 0.25.
I think of center as determining the center of one bin and then binwdith determines where the other bins are.

library(ggplot2)

DF <- data.frame(Value = rnorm(100))
ggplot(DF, aes(Value)) + geom_histogram(binwidth = 0.5, color = "white")

ggplot(DF, aes(Value)) + geom_histogram(binwidth = 0.5, color = "white", center = 0.25)

Created on 2020-09-07 by the reprex package (v0.3.0)

1 Like

Any bin, even one which will not be plotted. One of the defining characteristics of a histogram is tthe bin width. When you specify a center argument, the function will starting from a bin with the center you specified, determine the boundaries of all other possible bins. Then it shows you only the bins which contain actual data.

So, if my bin width is 1, then I will get the same results if I say to center on 0, 100, 1000, 1000000. If my bin width is 0.4, I would get the same histograms if I centered on 0, 2.4, 48.4, etc.

You would usually use it if you wanted to ensure all the values near a particular value were grouped together.

library(ggplot2)
set.seed(123)
df <- data.frame(x = rexp(1000, rate = 0.25))
mean(df$x)
#> [1] 4.119917
ggp1 <- ggplot(df) +
  geom_histogram(aes(x = x),
                 color = "black",
                 bins = 20)
ggp1

# centers of bins
ggplot_build(ggp1)$data[[1]][["x"]]
#>  [1]  0.000000  1.517933  3.035866  4.553799  6.071732  7.589665  9.107598
#>  [8] 10.625531 12.143464 13.661397 15.179330 16.697263 18.215196 19.733129
#> [15] 21.251062 22.768995 24.286928 25.804860 27.322793 28.840726


ggp2 <- ggplot(df) +
  geom_histogram(aes(x = x),
                 color = "black",
                 bins = 20,
                 center = 4)
ggp2

# centers of bins
ggplot_build(ggp2)$data[[1]][["x"]]
#>  [1] -0.5537989  0.9641341  2.4820670  4.0000000  5.5179330  7.0358659
#>  [7]  8.5537989 10.0717319 11.5896648 13.1075978 14.6255308 16.1434638
#> [13] 17.6613967 19.1793297 20.6972627 22.2151956 23.7331286 25.2510616
#> [19] 26.7689945 28.2869275

Alternately, you could use boundary to specify where any one of the breakpoints of the histogram might be. A common thing to do is set a meaningful bin width such as binwidth = 1 and boundary = 0,

ggp3 <- ggplot(df) +
  geom_histogram(aes(x = x),
                 color = "black",
                 binwidth = 1,
                 boundary = 0)
ggp3

# centers of bins
ggplot_build(ggp3)$data[[1]][["x"]]
#>  [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5 11.5 12.5 13.5 14.5
#> [16] 15.5 16.5 17.5 18.5 19.5 20.5 21.5 22.5 23.5 24.5 25.5 26.5 27.5 28.5

Created on 2020-09-07 by the reprex package (v0.3.0)

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.