Hi @dromano
After a little research it seems like this is how the ggplot2 devs wanted it to work. See more conversation on it here. In particular:
We then need to make sure there's some way to actually drop these NA values, because the na.rm to the layers can not work, because by the time the layer sees the data the missingness has been removed (as it's be converted to an integer position).
Their philosophy seems to be that missing data should be included in plots by default, which isn't strictly possible for continuous data, but it is possible for discrete data. The na.rm = TRUE/FALSE argument only controls the message printed to console (whether missing are handled silently or not).
The commit for ggplot2 which closed the above issue is shown here, and describes adding a new argument na.translate to the discrete axis scales.
- All discrete scales gain a
na.translate argument that allows you to
control whether NAs are translated to something that can be visualised
or left as missing. Note that if you leave as is (i.e.
na.translate = FALSE) they will passed on to the layer, which
will create warnings about dropping missing values. To suppress those,
you'll also need to add na.rm = TRUE to the layer call.
So you can control the presentation of NA by:
tibble(a = c('one', 'two', 'two', NA)) %>%
ggplot(aes(a)) +
geom_bar() +
scale_x_discrete(na.translate = FALSE)
Or to remove the NA silently, now you can use the na.rm argument
tibble(a = c('one', 'two', 'two', NA)) %>%
ggplot(aes(a)) +
geom_bar(na.rm = TRUE) +
scale_x_discrete(na.translate = FALSE)