resorting categories in ggplot2 - without accessing to the source data.frame

jdlong · December 10, 2018, 5:08pm

There's a lot of examples around of how to alter the sort order of groups in ggplot. They generally look like this:

Default Sorting:

library(tidyverse)
mt <- mtcars

mt$carb2 <-
  factor(mtcars$carb, levels = rev(levels(factor(mtcars$carb))))
p <- ggplot(data = mt, aes(y = carb2, x = mpg, colour = hp)) +
     geom_point()
p

Reverse sorting

p + scale_y_discrete(limits = rev(sort(unique(mt$carb2))))

Which works great, assuming you're doing a script and can know ahead of time the values going into the y axis. I'm working on a function and the y is built with a bunch of branching logic depending on the inputs. So is there a way to "reach inside" the p object above and extract out the y so I can resort it and feed it back into scale_y_discrete?

It seems like I could traverse the p object in some way and then evaluate the internal quosures. But I have to admit, I have never done anything like that and sort of don't know where to start.

jdlong · December 10, 2018, 6:22pm

Something like

layer_data(p)
#>     colour    x y PANEL group shape size fill alpha stroke
#> 1  #204464 21.0 3     1     3    19  1.5   NA    NA    0.5
#> 2  #204464 21.0 3     1     3    19  1.5   NA    NA    0.5
#> 3  #1C3C5A 22.8 6     1     6    19  1.5   NA    NA    0.5
#> 4  #204464 21.4 6     1     6    19  1.5   NA    NA    0.5
...

gets me in the right neighborhood... only the y values are the factor numbers, not the factor levels... Which is not exactly what I'm after (although it's close).

karawoo · December 10, 2018, 6:47pm

You can access the variable being mapped to the y aesthetic with p$mapping$y and then use a little rlang to access the variable for sorting:

library("ggplot2")
library("rlang")
mt <- mtcars

mt$carb2 <- factor(
  mtcars$carb,
  levels = rev(levels(factor(mtcars$carb)))
)

p <- ggplot(data = mt, aes(y = carb2, x = mpg, colour = hp)) +
  geom_point()

y <- p$mapping$y
p + scale_y_discrete(limits = rev(sort(unique(eval_tidy(y, mt)))))

Created on 2018-12-10 by the reprex package (v0.2.0).

taras · December 10, 2018, 6:58pm

This is pretty cool!

jdlong · December 10, 2018, 7:00pm

when @karawoo swoops in to whip some rlang on me, you know things are about to get cool!

Kara, as always, you're awesome. Thanks for the hand holding. I was stuck between flummoxed and perplexed.

taras · December 10, 2018, 7:03pm

Mark it as "solution" you must!

jdlong · December 10, 2018, 7:06pm

But.... um...

you reach back out and call the mt object with your rlang::eval_tidy(y, mt). Because of how my ggplot object is constructed (there's a bunch of branching logic before this step) I don't know exactly which data source is in there. Is that stored somewhere? I was expecting to find that when I called ggplot_build(p) but, alas, I didn't see the source data frame name in there.

jdlong · December 10, 2018, 7:18pm

for those playing along with the home game, what I'm doing here is trying to reverse the sort order in a waterfall chart. Line 276 of this: waterfalls/R/waterfall.R at master · CerebralMastication/waterfalls · GitHub

In practice it looks like this:

devtools::install_github("CerebralMastication/waterfalls")

library(waterfalls)
library(tidyverse)

waterfall(
  tibble(category = letters[1:5],
         value = c(200, -20, 4, 20, -150)),
  calc_total = TRUE,
  fill_by_sign = FALSE,
  put_rect_text_outside_when_value_below = 50, 
  coord_flip = FALSE
)  -> p
p

layer_data(p)
#>    x   y PANEL group
#> 1  1   0     1     1
#> 2  2 200     1     2
#> 3  3 180     1     3
#> 4  4 184     1     4
#> 5  5 204     1     5
#> 6  6 200     1     6
#> 7  1 180     1     1
#> 8  2 184     1     2
#> 9  3 204     1     3
#> 10 4  54     1     4
#> 11 5 204     1     5
#> 12 6  54     1     6

So in our input tibble we only had 5 categories, but in our output we have 6 because a total was added. That's an example of the logic that keeps me from knowing ex ante what data frame will be in the plot.

The reason I want to molest the sort order is this happens when I flip my coord:

waterfall(
  tibble(category = letters[1:5],
         value = c(200, -20, 4, 20, -150)),
  calc_total = TRUE,
  fill_by_sign = FALSE,
  put_rect_text_outside_when_value_below = 50, 
  coord_flip = TRUE
)  -> p
p

And I want those reversed so it reads top to bottom.

I can't just call scale_x_reverse() because a scale is already set in the code elsewhere.

taras · December 10, 2018, 7:22pm

Hey JD!
So, you're saying

Not to derail us from solving this within a ggplot call, but is it possible to make sorting a part of that function you're working on? I figure it won't be easy, and will require some tidy eval, but just a thought here...

taras · December 10, 2018, 7:22pm

Also this

jdlong · December 10, 2018, 7:23pm

yeah I can totally do that. But I'll have to have a sorting step at the end of each logic branch. Not impossible, but it means messier code with repeated logic. I'd like to just bolt "reverse the sort if the coords are flipped" on at the end, if I could. Would be much cleaner.

jdlong · December 10, 2018, 7:24pm

back door guests are best.

taras · December 10, 2018, 7:25pm

I see, OK.
Now! Don't wanna be a smart-ass, but you could make sorting a function, and then have it multiple times in your function... It is a function in a function. It is Funception
Image result for inception meme

(just to be clear, I wouldn't do it myself, because I'm lazy, and in my world, "the rule of three" is more like "the rule of thirty three". I'd just rather copy and paste and shoot myself in the foot...)

jdlong · December 10, 2018, 7:30pm

I love the idea but the problem is that I have the function... I just don't have the object on which I want to func. That's the mystery data structure.
Image result for we want the funk

But as I look at the code the only thing that changes the number of columns is the calc_total parameter. So maybe I just need locations I need to use this logic: one where calc_total == TRUE and one where it's not.

karawoo · December 10, 2018, 7:35pm

Oh sorry, I misread and thought only the y axis variable was unknown. If you don't know the name of the data frame either, then I think this small tweak should work:

y <- p$mapping$y
p + scale_y_discrete(limits = rev(sort(unique(eval_tidy(y, p$data)))))

jdlong · December 10, 2018, 7:49pm

that is exactly the answer to my question!

However it turns out that the ggplot object I'm trying to molest was apparently created with some type of layering. Because when I whip a little wf + scale_x_discrete(limits = rev(levels(wf$data$x))) where wf is my waterfall plot, I get a reversal of only my axis labels, not the actual columns of data. But that's a problem for another thread. This one's too covered in memes.

jdlong · December 17, 2018, 7:49pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.