A tidy way to order stacked bar chart by fill subset

I want to sort a stacked ggplot bar chart by the relative frequency of a subset in the fill.

    library(ggplot2)
    library(tibble)
    library(scales)

    factor1 <- as.factor(c("ABC", "CDA", "XYZ", "YRO"))
    factor2 <- as.factor(c("A", "B"))

    set.seed(43)
    data <- tibble(x = sample(factor1, 1000, replace = TRUE),
                   z = sample(factor2, 1000, replace = TRUE))

One answer is to use tapply, provided by a stack overflow answer here.

lvls <- names(sort(tapply(data$z == "B", data$x, mean)))

ggplot(data = data, aes(factor(x, levels = lvls), fill = z)) +
  geom_bar(position = "fill") +
  scale_y_continuous(labels = percent)

This is a correct answer but I am wondering if there is another more tidy way to do this?

I am primarily interested in a way to do this that does not involve dplyr, but any suggestions are welcome.

You can use forcats::fct_reorder in a mutate call like this:

set.seed(1234)
data <- tibble(x = sample(factor1, 1000, replace = TRUE),
               z = sample(factor2, 1000, replace = TRUE))

data %>% 
  mutate(x = forcats::fct_reorder(x, as.numeric(z), fun = mean)) %>% 
  ggplot(aes(x, fill = z)) +
    geom_bar(position = "fill") +
    scale_y_continuous(labels = percent)

which gives you this:
image

If you you want to avoid the mutate call then you can put the fct_reorder call inside your ggplot call like this:

ggplot(data, aes(forcats::fct_reorder(x, as.numeric(z), fun = mean), fill = z)) +
  geom_bar(position = "fill") +
  scale_y_continuous(labels = percent)

This gives you the same graph except your x axis label is now pretty ugly. IMO it is better to put this refactoring in the mutate call because it makes your code much more readable and explicit as to what you are trying to accomplish.

6 Likes

That's great. I suspected that fct_reorder would be involved. I tried this but didn't specify z as.numeric. Thanks a lot!

1 Like