Clearest way to order a factor by the mean of another variable, weighted by a third.

Was recently asked a question about how to organise the facets of a plot using a value. Specifically, (using mtcars) taking the mean of hp for each cyl and gear grouping and plotting the cyl with the highest mean hp for 3-gear vehicles first. Successfully managed to create this graph:

But my code seemed a bit hacky, creating a new column:

mtcars %>% 
  group_by(gear) %>% 
  mutate(hp_gear3 = ifelse(gear == 3, hp, NA),
         cyl = fct_reorder(factor(cyl),
                           hp_gear3, 
                           mean, 
                           na.rm = TRUE,
                           .desc = TRUE)) %>%
  ggplot(aes(factor(gear), hp)) +
  stat_summary(fun = mean) +
  facet_wrap(~cyl)

I thought about using weighted.mean instead inside of fct_reorder2(), and this code gives the same output in fewer lines:

mtcars %>% 
  mutate(
    cyl = fct_reorder2(factor(cyl),
                       hp, 
                       gear==3, 
                       weighted.mean,
                       .desc = TRUE)) %>% 
  ggplot(aes(factor(gear), hp)) +
  stat_summary(fun = mean) +
  facet_wrap(~cyl)

My intention here is to order each level of cyl by weighting the mean gears conditional on 3-geared vehicles alone. It gives the right graph but I can't quite figure out whether it's doing what I'm asking it to, or just luckily landing in the right place. Am I understanding the working of fct_reorder2() correctly?

I'm not sure I understand what the mean is supposed to be weighted by, since we're looking only at 3-geared vehicles, so let me know if this is what you had in mind:

# With fct_reorder2
mtcars %>% 
  mutate(cyl = fct_reorder2(factor(cyl), hp, gear, 
                            function(x,y) mean(x[y==3]), .desc=TRUE)) %>% 
  ggplot(aes(factor(gear), hp)) +
    stat_summary(fun = mean) +
    facet_wrap(~cyl)

# Create a new column to order by
mtcars %>% 
  group_by(cyl) %>% 
  mutate(hp.mean = mean(hp[gear==3])) %>% 
  ungroup %>% 
  arrange(desc(hp.mean)) %>% 
  mutate(cyl = factor(cyl, levels=unique(cyl))) %>% 
  ggplot(aes(factor(gear), hp)) +
    stat_summary(fun = mean) +
    facet_wrap(~cyl)

Smashing! Gets them all in the right order each time! And putting it inside facet_wrap() is the chef's kiss of parsimony/tidyness here. Thanks @joels!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

Ah, now I see. Nice use of weighted.mean for filtering at the same time! As far as I know, fct_reorder and fct_reorder2 are working the way you think they are. You can shorten the code further by reordering within facet_wrap:

mtcars %>% 
  ggplot(aes(factor(gear), hp)) +
    stat_summary(fun = mean) +
    facet_wrap(~ fct_reorder2(factor(cyl), hp, gear==3, weighted.mean, .desc=TRUE))

To check what fct_reorder2 is doing, you could create a simpler data frame and compare the output of an explicit reordering with the output of fct_reorder2. For example:

library(tidyverse)

d = data.frame(cyl = rep(1:3, each=6),
               gear = rep(1:3, 6),
               hp = c(10, 5, 1, 9, 4, 0, 
                      5, 1, 10, 4, 0, 9, 
                      1, 10, 5, 0, 9, 4)) %>% 
  arrange(gear)

map(1:3, ~ {
  list(
    cyl.order = d %>% 
      mutate(cyl=fct_reorder2(factor(cyl), hp, gear==.x, weighted.mean, .desc=TRUE)) %>% 
      pull(cyl) %>% levels,
    check.cyl.order = d %>% 
      group_by(cyl) %>% 
      summarise(hp.mean = weighted.mean(hp, gear==.x)) %>% 
      arrange(desc(hp.mean))
  )
})
#> [[1]]
#> [[1]]$cyl.order
#> [1] "1" "2" "3"
#> 
#> [[1]]$check.cyl.order
#> # A tibble: 3 x 2
#>     cyl hp.mean
#>   <int>   <dbl>
#> 1     1     9.5
#> 2     2     4.5
#> 3     3     0.5
#> 
#> 
#> [[2]]
#> [[2]]$cyl.order
#> [1] "3" "1" "2"
#> 
#> [[2]]$check.cyl.order
#> # A tibble: 3 x 2
#>     cyl hp.mean
#>   <int>   <dbl>
#> 1     3     9.5
#> 2     1     4.5
#> 3     2     0.5
#> 
#> 
#> [[3]]
#> [[3]]$cyl.order
#> [1] "2" "3" "1"
#> 
#> [[3]]$check.cyl.order
#> # A tibble: 3 x 2
#>     cyl hp.mean
#>   <int>   <dbl>
#> 1     2     9.5
#> 2     3     4.5
#> 3     1     0.5

Created on 2020-11-30 by the reprex package (v0.3.0)

1 Like

Thanks for this! Your first solution is maybe a more readable way to do the few lines option and the second is similar to my first solution. Seems there are a good few ways of doing it! Was just curious as to a) how fct_reorder(fun) works and b) what the most parsimonious code possible would be.

My idea behind weighted.mean was to order the cyl based on the mean across all gears, but weighted so that gear==3 is 1 and the other gear values are weighted at zero. This route gives the correct values in a summarise argument:

# Weighted mean of gear == 3
mtcars %>% 
  group_by(cyl) %>% 
  summarise(mn = weighted.mean(x = hp, w = gear == 3))

# Gives the same as...

mtcars %>% 
  filter(gear == 3) %>% 
  group_by(cyl) %>%
  summarise(mn = mean(hp))

# A tibble: 3 x 2
    cyl    mn
  <dbl> <dbl>
1     4   97 
2     6  108.
3     8  194.

If fct_reorder is working the way I think it is it's doing something similar to the above - grouping by the f argument (factor(cyl)), calculating a weighted mean of the hp vector, weighted by whether gear is 3 (effectively filtering), then ordering the three outputs by magnitude to create facet levels.

Also trying to think of a test to see if this is what's actually happening. Would just be a very neat solution if so! Been trying to squeeze more fct_ functions into my daily life after watching some @drob tidy tuesdays!