 # Clearest way to order a factor by the mean of another variable, weighted by a third.

Was recently asked a question about how to organise the facets of a plot using a value. Specifically, (using `mtcars`) taking the mean of `hp` for each `cyl` and `gear` grouping and plotting the `cyl` with the highest mean hp for 3-gear vehicles first. Successfully managed to create this graph:

But my code seemed a bit hacky, creating a new column:

``````mtcars %>%
group_by(gear) %>%
mutate(hp_gear3 = ifelse(gear == 3, hp, NA),
cyl = fct_reorder(factor(cyl),
hp_gear3,
mean,
na.rm = TRUE,
.desc = TRUE)) %>%
ggplot(aes(factor(gear), hp)) +
stat_summary(fun = mean) +
facet_wrap(~cyl)
``````

I thought about using `weighted.mean` instead inside of `fct_reorder2()`, and this code gives the same output in fewer lines:

``````mtcars %>%
mutate(
cyl = fct_reorder2(factor(cyl),
hp,
gear==3,
weighted.mean,
.desc = TRUE)) %>%
ggplot(aes(factor(gear), hp)) +
stat_summary(fun = mean) +
facet_wrap(~cyl)
``````

My intention here is to order each level of `cyl` by weighting the mean gears conditional on 3-geared vehicles alone. It gives the right graph but I can't quite figure out whether it's doing what I'm asking it to, or just luckily landing in the right place. Am I understanding the working of `fct_reorder2()` correctly?

I'm not sure I understand what the mean is supposed to be weighted by, since we're looking only at 3-geared vehicles, so let me know if this is what you had in mind:

``````# With fct_reorder2
mtcars %>%
mutate(cyl = fct_reorder2(factor(cyl), hp, gear,
function(x,y) mean(x[y==3]), .desc=TRUE)) %>%
ggplot(aes(factor(gear), hp)) +
stat_summary(fun = mean) +
facet_wrap(~cyl)

# Create a new column to order by
mtcars %>%
group_by(cyl) %>%
mutate(hp.mean = mean(hp[gear==3])) %>%
ungroup %>%
arrange(desc(hp.mean)) %>%
mutate(cyl = factor(cyl, levels=unique(cyl))) %>%
ggplot(aes(factor(gear), hp)) +
stat_summary(fun = mean) +
facet_wrap(~cyl)
``````

Smashing! Gets them all in the right order each time! And putting it inside `facet_wrap()` is the chef's kiss of parsimony/tidyness here. Thanks @joels!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

Ah, now I see. Nice use of `weighted.mean` for filtering at the same time! As far as I know, `fct_reorder` and `fct_reorder2` are working the way you think they are. You can shorten the code further by reordering within `facet_wrap`:

``````mtcars %>%
ggplot(aes(factor(gear), hp)) +
stat_summary(fun = mean) +
facet_wrap(~ fct_reorder2(factor(cyl), hp, gear==3, weighted.mean, .desc=TRUE))
``````

To check what `fct_reorder2` is doing, you could create a simpler data frame and compare the output of an explicit reordering with the output of `fct_reorder2`. For example:

``````library(tidyverse)

d = data.frame(cyl = rep(1:3, each=6),
gear = rep(1:3, 6),
hp = c(10, 5, 1, 9, 4, 0,
5, 1, 10, 4, 0, 9,
1, 10, 5, 0, 9, 4)) %>%
arrange(gear)

map(1:3, ~ {
list(
cyl.order = d %>%
mutate(cyl=fct_reorder2(factor(cyl), hp, gear==.x, weighted.mean, .desc=TRUE)) %>%
pull(cyl) %>% levels,
check.cyl.order = d %>%
group_by(cyl) %>%
summarise(hp.mean = weighted.mean(hp, gear==.x)) %>%
arrange(desc(hp.mean))
)
})
#> []
#> []\$cyl.order
#>  "1" "2" "3"
#>
#> []\$check.cyl.order
#> # A tibble: 3 x 2
#>     cyl hp.mean
#>   <int>   <dbl>
#> 1     1     9.5
#> 2     2     4.5
#> 3     3     0.5
#>
#>
#> []
#> []\$cyl.order
#>  "3" "1" "2"
#>
#> []\$check.cyl.order
#> # A tibble: 3 x 2
#>     cyl hp.mean
#>   <int>   <dbl>
#> 1     3     9.5
#> 2     1     4.5
#> 3     2     0.5
#>
#>
#> []
#> []\$cyl.order
#>  "2" "3" "1"
#>
#> []\$check.cyl.order
#> # A tibble: 3 x 2
#>     cyl hp.mean
#>   <int>   <dbl>
#> 1     2     9.5
#> 2     3     4.5
#> 3     1     0.5
``````

Created on 2020-11-30 by the reprex package (v0.3.0)

1 Like

Thanks for this! Your first solution is maybe a more readable way to do the few lines option and the second is similar to my first solution. Seems there are a good few ways of doing it! Was just curious as to a) how `fct_reorder(fun)` works and b) what the most parsimonious code possible would be.

My idea behind `weighted.mean` was to order the `cyl` based on the mean across all gears, but weighted so that `gear==3` is 1 and the other gear values are weighted at zero. This route gives the correct values in a `summarise` argument:

``````# Weighted mean of gear == 3
mtcars %>%
group_by(cyl) %>%
summarise(mn = weighted.mean(x = hp, w = gear == 3))

# Gives the same as...

mtcars %>%
filter(gear == 3) %>%
group_by(cyl) %>%
summarise(mn = mean(hp))

# A tibble: 3 x 2
cyl    mn
<dbl> <dbl>
1     4   97
2     6  108.
3     8  194.
``````

If `fct_reorder` is working the way I think it is it's doing something similar to the above - grouping by the `f` argument (`factor(cyl)`), calculating a weighted mean of the `hp` vector, weighted by whether `gear` is 3 (effectively filtering), then ordering the three outputs by magnitude to create facet levels.

Also trying to think of a test to see if this is what's actually happening. Would just be a very neat solution if so! Been trying to squeeze more `fct_` functions into my daily life after watching some @drob tidy tuesdays!