na.rm argument to geom_bar() not working

Why does this not remove the NA column from the plot?

tibble(a = c('one', 'two', 'two', NA)) %>% 
  ggplot(aes(a)) +
  geom_bar(na.rm = TRUE)

Hi @dromano

After a little research it seems like this is how the ggplot2 devs wanted it to work. See more conversation on it here. In particular:

We then need to make sure there's some way to actually drop these NA values, because the na.rm to the layers can not work, because by the time the layer sees the data the missingness has been removed (as it's be converted to an integer position).

Their philosophy seems to be that missing data should be included in plots by default, which isn't strictly possible for continuous data, but it is possible for discrete data. The na.rm = TRUE/FALSE argument only controls the message printed to console (whether missing are handled silently or not).

The commit for ggplot2 which closed the above issue is shown here, and describes adding a new argument na.translate to the discrete axis scales.

  • All discrete scales gain a na.translate argument that allows you to
    control whether NAs are translated to something that can be visualised
    or left as missing. Note that if you leave as is (i.e.
    na.translate = FALSE) they will passed on to the layer, which
    will create warnings about dropping missing values. To suppress those,
    you'll also need to add na.rm = TRUE to the layer call.

So you can control the presentation of NA by:

tibble(a = c('one', 'two', 'two', NA)) %>% 
  ggplot(aes(a)) +
  geom_bar() +
  scale_x_discrete(na.translate = FALSE)

Or to remove the NA silently, now you can use the na.rm argument

tibble(a = c('one', 'two', 'two', NA)) %>% 
  ggplot(aes(a)) +
  geom_bar(na.rm = TRUE) +
  scale_x_discrete(na.translate = FALSE)
3 Likes

Thanks @mattwarkentin, this is very helpful. It is, however, at odds with the current documentation for geom_bar(), which explicitly says missing values are removed:

na.rm	If FALSE, the default, missing values are removed with a warning.
        If TRUE, missing values are silently removed.

What would you suggest would be the appropriate place to request the documentation be updated?

The proper place would be the GitHub repo by filing an issue or if you feel up to the task by directly making a pull request with the proposed changes.

Yes, I agree with you. The help documentation for geom_bar doesn't seem to express the actual functionality of that argument.

As @andresrcs suggested, you can file an Issue on GitHub and let the maintainers decide on whether/how to handle any potential changes. Or if you feel up to it, you can fork the repository, make the changes you'd like to see, and submit a pull request to the maintainers to pull the changes into the main repository. You'll need to be fairly familiar with git and GitHub to achieve the second approach; while the first approach is very easy, (all you need is a GitHub account, and a couple minutes to describe the issue).

Thanks @mattwarkentin and @andresrcs -- I think I'll look into making the changes myself, along the lines of Matt's suggested instructions, and if not, will at least file an issue.

Thanks again,
David

If you have any questions about forking a repository or making a pull request, feel free to let me know and I will be happy to help you through it.

I find that submitting a pull request and contributing to open-source projects you use is surprisingly rewarding. It is also surprisingly simple to do, so doing it for the first time demystifies what seems like a complicated process.

Thanks again, @mattwarkentin -- I'll be sure to follow up with any questions I stumble upon in the process.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.