Trouble scaling y axis to percentages from counts

Hi there,

I am currently trying to produce a histogram using ggplot, but need to convert the y axis from counts to percentages.

This is a reprex of my current work with ggplot:

ggplot(null_distribution,aes(x=stat))+geom_histogram(binwidth=.5)+scale_y_continuous(labels = percent_format())
#> Error in ggplot(null_distribution, aes(x = stat)): could not find function "ggplot"

Created on 2019-10-23 by the reprex package (v0.3.0)

I have tried using percent_format, as well as scale_y_continuous. The issue that I'm having is that I cannot remove the extra 0s from my percentages. In other words, 40% is appearing as 40000%. Any help would be appreciated!

Hi George,

You didn't provide the data so I simulated some to get a working example. There are two important points:

  1. ggplot provides some special notation to access internal variables which makes plotting percentages in histograms straightforward (e.g. ..count..)

  2. percent_format() returns a function that will take the y values and multiple them by 100 and add a percent sign. So you are taking the counts on the y-axis and they are going from 40 to 4000%. The function is working correctly, you are just asking it the wrong question.

Try the code below, note the y = ..density..

null_distribution <- tibble(stat = rnorm(100))

ggplot(null_distribution, aes(x = stat)) +
  geom_histogram(aes(y = ..density..), binwidth = 0.5) +
  scale_y_continuous(labels = scales::percent_format())
1 Like

You should be able to replace this with the more recent form of:
y = stat(density)

Good suggestion @martin.R. I agree, the new form of calling internal variables is clearer.

null_distribution <- tibble(stat = rnorm(100))

ggplot(null_distribution, aes(x = stat)) +
  geom_histogram(aes(y = stat(density), binwidth = 0.5) +
  scale_y_continuous(labels = scales::percent_format())

Thank you for your help! It looks like this got me closer by dropping some of the 0s. It now appears as 300%, 200%, 100%. Any ideas on how to fix these? This is a screenshot of a portion of the data:

Screenshots are not very useful (and not a good thing to do here) if you need more specific help, please provide a proper reproducible example including sample data on a copy/paste friendly, take a look to this guide to learn how to do one.

The advice from @andresrcs is good. You will receive much better help if you can provide the minimum amount of code or data needed to be able to completely reproduce the issue you are facing. Screenshots aren't helpful because they can't easily be turned into usable code or data.

That said, I again tried to simulate data that looks like yours, and I cannot reproduce the issue of percentages >100%.

data <- tibble(replicate = seq(1:100),
               stat = sample(seq(-2, 2, by = 0.5), 100, replace = T))

head(data)

# A tibble: 6 x 2
  replicate  stat
      <int> <dbl>
1         1   0  
2         2   1.5
3         3  -0.5
4         4   0.5
5         5  -1  
6         6  -0.5

ggplot(data, aes(x = stat)) +
  geom_histogram(aes(y = stat(density)), binwidth = 0.5) +
  scale_y_continuous(labels = scales::percent_format())

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.