Can I recreate this plot using only stat_summary and geoms?

I'm working on automating a report my department makes annually. It requires writing a lot of boiler plate for a ggplot2 plot, and then adding some other indicators manually. I'd like to break the code up into smaller module so that maintaining and editing the code is not a huge hassle. I figure if I can generate the plot using only geoms and stat_summary, that would be a huge help.

I've included a reprex to generate the plot I need. It requires faceting a histogram, plotting a vertical line at the median, and including some text to highlight the median.

Is it possible to recreate this plot using only geoms and stat_summary? If so, how can I go about doing that? I've made a first go of it here, but am stuck on the geom_labeling of the median (as shown in the reprex).

d %>% 
  ggplot(aes(metric))+
  geom_histogram()+
  stat_summary(aes(x = 0, xintercept = stat(y), y = metric), 
               fun = median, geom = "vline", colour = "red")+
  facet_wrap(~classification)

Reprex for my problem below:

library(tidyverse)
#> Warning: package 'tibble' was built under R version 3.6.2

d = tibble::tribble(
  ~classification, ~metric,
  "Scientist",       7,
  "Member",       2,
  "Scientist",      19,
  "Scientist",      44,
  "Scientist",      76,
  "Researcher",      21,
  "Member",       2,
  "Researcher",      38,
  "Member",       1,
  "Member",       4,
  "Scientist",      40,
  "Scientist",      20,
  "Researcher",       5,
  "Member",      13,
  "Scientist",      84,
  "Member",       9,
  "Member",       5,
  "Member",       1,
  "Member",       9,
  "Researcher",       8,
  "Scientist",      39,
  "Member",       6,
  "Researcher",      39,
  "Member",       3,
  "Researcher",      10,
  "Researcher",       8,
  "Researcher",      32,
  "Scientist",      56,
  "Member",      12,
  "Member",      14,
  "Researcher",      39,
  "Member",       2,
  "Member",       4,
  "Member",       2,
  "Researcher",      17,
  "Member",       5,
  "Member",      10,
  "Scientist",      40,
  "Researcher",      30,
  "Researcher",       5,
  "Researcher",       5,
  "Member",       1,
  "Scientist",      23,
  "Researcher",       9,
  "Member",      22,
  "Scientist",      54,
  "Scientist",      12,
  "Researcher",      30,
  "Member",       3,
  "Member",       2,
  "Member",      18,
  "Member",       1,
  "Researcher",      35,
  "Member",       5,
  "Member",      11,
  "Researcher",      10,
  "Member",       1,
  "Researcher",       9,
  "Member",       3,
  "Researcher",      10,
  "Member",       6,
  "Scientist",      16,
  "Member",       8,
  "Researcher",      12,
  "Member",       2,
  "Member",      10,
  "Scientist",      40,
  "Researcher",      29,
  "Scientist",      34,
  "Scientist",      27,
  "Researcher",      13,
  "Researcher",      17,
  "Researcher",      12,
  "Scientist",      18,
  "Researcher",       7,
  "Member",       2,
  "Scientist",      19,
  "Researcher",       8,
  "Researcher",      14,
  "Researcher",      31,
  "Scientist",      50,
  "Researcher",       9,
  "Scientist",      14,
  "Member",       3,
  "Member",       8,
  "Scientist",      47,
  "Researcher",      20,
  "Scientist",      19,
  "Member",      20,
  "Member",       6,
  "Researcher",      36,
  "Member",       8,
  "Researcher",      10,
  "Member",       8,
  "Researcher",       5,
  "Member",       8,
  "Scientist",      42,
  "Researcher",      31,
  "Researcher",      22,
  "Researcher",      10,
  "Researcher",      57,
  "Scientist",      19,
  "Scientist",      76,
  "Member",      10,
  "Scientist",      79,
  "Member",      60,
  "Researcher",      22,
  "Researcher",      14,
  "Researcher",      32,
  "Member",       2,
  "Member",       5,
  "Member",       0,
  "Scientist",      59,
  "Researcher",       4,
  "Member",       5,
  "Scientist",      27,
  "Scientist",      32,
  "Member",       7,
  "Researcher",      18,
  "Member",       3,
  "Researcher",       5,
  "Scientist",      24,
  "Member",       5,
  "Researcher",      10,
  "Researcher",      28,
  "Member",       9,
  "Member",      12,
  "Scientist",      12,
  "Researcher",       2,
  "Researcher",       7,
  "Member",      13,
  "Member",       2,
  "Member",      17,
  "Scientist",      32,
  "Member",      18,
  "Researcher",      12,
  "Scientist",      43,
  "Member",       3,
  "Researcher",       6,
  "Researcher",      15,
  "Scientist",      21,
  "Member",       4,
  "Researcher",      23
)



d %>%
  group_by(classification) %>% 
  mutate(class_med = median(metric)) %>% 
  ggplot(aes(metric))+
  geom_histogram()+
  geom_vline(aes(xintercept = class_med))+
  geom_text(aes(x = class_med, y = 10, label = glue::glue('Median: {class_med}')), vjust = 'top', angle = 90)+
  facet_grid(~classification)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Created on 2020-04-01 by the reprex package (v0.3.0)

Hi @DemetriPananos: Could you say a little more about why (how) it would help? I'm having trouble seeing where the simplification would occur.

Was it David Robinson that said

If you do something three or more times, write a function

We have to make 6 similar plots for 100+ items we measure. My supervisors are still uncertain as to what exactly they want to see (medians vs means, etc), so I figure if I can wrap functions to call only geoms and stat_summary, it saves me having to redo a lot of data manipulation. This is a minimal example, and I plan to wrap most everything in functions and use quosures.

Even if it doesn't help, it would be interesting to know how to do this in ggplot2 alone.

I understand the motivation, but I'm not sure how the change helps you: The stat_summary() command replaces the geom_vline() and geom_text() commands, and the mutate() command isn't necessary in your plot code, so I'm not sure where the savings in time will be from that point of view.

From the David Robinson point of view, you could create a function that takes a statistic (mean, median) as input and creates plots like the one you have -- is that what you mean?

I'm aware I can wrap the solution I've posted in a function, and I'm prepared to take that approach should I need to. I'm more interested in if it can be done with ggplot2 geoms.

I guess I'm confused since your working code does use ggplot2 geoms, so I can't tell what you're trying to improve upon. In other words, what aspect of your working code is what you're trying to replace?

I'm looking to replace the mutate(class_med = median(metric)). I've managed to use stat_summary to plot the median line which would replace the geom_vline. Now I'm wondering how I can use stat_summary to replace the geom_text.

I see. The mutate() command isn't necessary, but I don't see how to get around using geom_text():

d %>%
  group_by(classification) %>% 
  ggplot(aes(metric))+
  geom_histogram()+
  geom_vline(aes(xintercept = median(metric)))+
  geom_text(aes(x = median(metric), y = 10, label = glue::glue('Median: {median(metric)}')), vjust = 'top', angle = 90)+
  facet_grid(~classification)

@DemetriPananos: This is what I meant, but am not sure it's in the direction you're looking for:

load data and packages
library(tidyverse)
library(rlang)
#> 
#> Attaching package: 'rlang'
#> The following objects are masked from 'package:purrr':
#> 
#>     %@%, as_function, flatten, flatten_chr, flatten_dbl,
#>     flatten_int, flatten_lgl, flatten_raw, invoke, list_along,
#>     modify, prepend, splice
d = tibble::tribble(
  ~classification, ~metric,
  "Scientist",       7,
  "Member",       2,
  "Scientist",      19,
  "Scientist",      44,
  "Scientist",      76,
  "Researcher",      21,
  "Member",       2,
  "Researcher",      38,
  "Member",       1,
  "Member",       4,
  "Scientist",      40,
  "Scientist",      20,
  "Researcher",       5,
  "Member",      13,
  "Scientist",      84,
  "Member",       9,
  "Member",       5,
  "Member",       1,
  "Member",       9,
  "Researcher",       8,
  "Scientist",      39,
  "Member",       6,
  "Researcher",      39,
  "Member",       3,
  "Researcher",      10,
  "Researcher",       8,
  "Researcher",      32,
  "Scientist",      56,
  "Member",      12,
  "Member",      14,
  "Researcher",      39,
  "Member",       2,
  "Member",       4,
  "Member",       2,
  "Researcher",      17,
  "Member",       5,
  "Member",      10,
  "Scientist",      40,
  "Researcher",      30,
  "Researcher",       5,
  "Researcher",       5,
  "Member",       1,
  "Scientist",      23,
  "Researcher",       9,
  "Member",      22,
  "Scientist",      54,
  "Scientist",      12,
  "Researcher",      30,
  "Member",       3,
  "Member",       2,
  "Member",      18,
  "Member",       1,
  "Researcher",      35,
  "Member",       5,
  "Member",      11,
  "Researcher",      10,
  "Member",       1,
  "Researcher",       9,
  "Member",       3,
  "Researcher",      10,
  "Member",       6,
  "Scientist",      16,
  "Member",       8,
  "Researcher",      12,
  "Member",       2,
  "Member",      10,
  "Scientist",      40,
  "Researcher",      29,
  "Scientist",      34,
  "Scientist",      27,
  "Researcher",      13,
  "Researcher",      17,
  "Researcher",      12,
  "Scientist",      18,
  "Researcher",       7,
  "Member",       2,
  "Scientist",      19,
  "Researcher",       8,
  "Researcher",      14,
  "Researcher",      31,
  "Scientist",      50,
  "Researcher",       9,
  "Scientist",      14,
  "Member",       3,
  "Member",       8,
  "Scientist",      47,
  "Researcher",      20,
  "Scientist",      19,
  "Member",      20,
  "Member",       6,
  "Researcher",      36,
  "Member",       8,
  "Researcher",      10,
  "Member",       8,
  "Researcher",       5,
  "Member",       8,
  "Scientist",      42,
  "Researcher",      31,
  "Researcher",      22,
  "Researcher",      10,
  "Researcher",      57,
  "Scientist",      19,
  "Scientist",      76,
  "Member",      10,
  "Scientist",      79,
  "Member",      60,
  "Researcher",      22,
  "Researcher",      14,
  "Researcher",      32,
  "Member",       2,
  "Member",       5,
  "Member",       0,
  "Scientist",      59,
  "Researcher",       4,
  "Member",       5,
  "Scientist",      27,
  "Scientist",      32,
  "Member",       7,
  "Researcher",      18,
  "Member",       3,
  "Researcher",       5,
  "Scientist",      24,
  "Member",       5,
  "Researcher",      10,
  "Researcher",      28,
  "Member",       9,
  "Member",      12,
  "Scientist",      12,
  "Researcher",       2,
  "Researcher",       7,
  "Member",      13,
  "Member",       2,
  "Member",      17,
  "Scientist",      32,
  "Member",      18,
  "Researcher",      12,
  "Scientist",      43,
  "Member",       3,
  "Researcher",       6,
  "Researcher",      15,
  "Scientist",      21,
  "Member",       4,
  "Researcher",      23
)
f <- function(my_stat, col) {
  my_stat <- enexpr(my_stat)
  col <- enexpr(col)
  eval(
  d %>%
    group_by(classification) %>% 
    ggplot(aes(!!col))+
    geom_histogram()+
    geom_vline(aes(xintercept = (!!my_stat)(!!col)))+
    geom_text(aes(x = (!!my_stat)(!!col), y = 10, label = paste(as_string(my_stat), (!!my_stat)(!!col))), vjust = 'top', angle = 90)+
    facet_grid(~classification)
  )
}
undebug(f)
#> Warning in undebug(f): argument is not being debugged

f(median, metric)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Created on 2020-04-01 by the reprex package (v0.3.0)

It doesn't take care of stats by facet, but I was just wondering if this was the flavor of what you were looking for.

Now I understand better @DemetriPananos: The mutate() function allowed the faceting to work properly in your orignal code. Back to thinking...

There's nothing wrong with summarizing the data outside of ggplot. Either way, you can write a function that will create the plot and avoid the need for repeated code. However, in this case, the plot can be done completely within ggplot using stat_summaryh from the ggstance package. For example:

library(tidyverse)
theme_set(theme_classic() + 
            theme(panel.background = element_rect(colour = "grey40", fill = NA)))
library(ggstance) 

d %>%
  ggplot(aes(x=metric)) +
    geom_histogram() +
    stat_summaryh(fun.x=median, geom="vline", aes(xintercept=..x.., y=0)) +
    stat_summaryh(fun.x=median, geom="text", 
                  aes(label=paste0("median: ",..x..), y=15), 
                  angle=90, position=position_nudge(x=3)) +
    facet_grid(~classification) +
    scale_y_continuous(expand=expansion(c(0,0.02)), limits=c(0,NA))

A potential problem is that you don't necessarily know in advance where to place the text labels vertically or horizontally. The code above hard-codes those values. To set them dynamically, we can create a function.

The function below first creates a histogram plot without the text labels. Then we use the layer_scales function to get the x-range and y-range for the plot. Given these ranges, we can set the vertical and horizontal location of the text labels. I've chosen to plot the labels horizontally at a location guaranteed to be just above the highest histogram bar. You can of course plot them vertically, but it will be difficult to guarantee that the labels won't straddle the top of a bar.

pfnc = function(data, xvar, facet.var=NULL) {
  
  p = data %>%
    ggplot(aes(x={{xvar}})) +
    geom_histogram() +
    stat_summaryh(fun.x=median, geom="vline", aes(xintercept=..x.., y=0), colour="red") +
    facet_grid(cols=vars({{facet.var}})) +
    scale_y_continuous(expand=expansion(c(0,0.05)))
  
  # Set vertical and horizontal position of text by getting x-range and 
  #  y-range from the plot we just created
  ypos = 1.025*max(layer_scales(p)[["y"]][["range"]][["range"]])
  xnudge = 0.05*diff(range(layer_scales(p)[["y"]][["range"]][["range"]]))
    
  p + 
    stat_summaryh(fun.x=median, geom="text", 
                  aes(y=ypos, label=paste0("median: ", ..x..)),
                  colour="red", size=3.5, hjust=0,
                  position=position_nudge(x=xnudge))

}

pfnc(d, metric, classification)

Now that we have the function, we can use it on any data frame. For example:

pfnc(mtcars, mpg, cyl)

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.