Best practice to sort categorical variable by a continuous variable in ggplot2

I've seen a number of solutions on StackOverflow of this sort but I'm not sure what the best solution is to sort a categorical variable (default is alphabetical) based on a continuous variable.

Following is a simple example of this idea.

library(tidyverse) 
data <- data.frame(Name = c('b','a','c'), Value = c(1,2,3))
g <- data %>% 
  ggplot(aes(Name, Value)) + geom_col()
g

I came across the following solution and this has been my favorite thus far, as it makes the most sense to me.

g <- data %>% 
  ggplot(aes(reorder(Name, Value), Value)) + 
  geom_col() + 
  labs(x = "Name",y="Value")
g

How does the community generally go about doing this?

2 Likes

The forcats package has the fct_reorder function. Not only will that work as a drop-in replacement for reorder in your case, it can also handle reordering based on each factor representing multiple rows. For example, that would allow sorting a boxplot by minimum, median, or maximum. There's also the fct_reorder2 function for 2d data, such as choosing colors of a scatterplot.

5 Likes

I'll have to look into using that function from forcats as I haven't previously seen it.

I'd like to plug the two posts linked from the forcats page -- they were very helpful for me in understands how strings and factors got to be so tied up and confusing in R, and helped me to understand the utility of correctly using factors in my visualization work:

stringsAsFactors: An unauthorized biography:
stringsAsFactors = sigh

2 Likes

Personally I like the arrange %>% mutate method, which I did not see as an answer on the SO post your referenced. It would look like this:

data %>% 
  arrange(Value) %>%
  mutate(Name = factor(Name, levels = unique(Name))) %>%
  ggplot(aes(Name, Value)) + 
  geom_col() + 
  labs(x = "Name",y="Value")

and if you wanted your bars to be ordered with the highest on the left and the lowest on the right then you would change the arrange function to look like arrange(desc(Value))

6 Likes

Thank you @tbradley, I really like this solution as it uncouples the reorder() function from being nested within the ggplot function.

1 Like

If you don’t like nesting reorder in the ggplot call, you can just use it in mutate() without arrange(). I think it’s cleaner that way since it more closely captures the intent on a single line.