Highlight lowest 2 boxplots at 75% in ggplot2?

Hello, I'm using the nycflights13 data set and I would like to highlight the worst two carriers per origin airport by the 75th percentile of departure delays. I can't seem to find a way to do this. I was going to create a quantile column where and then try to create a boolean value that marks if that carrier is the bottom two per each origin airport. Any ideas on how I could do this?

library(tidyverse)
library(nycflights13)

flights %>% 
  filter(sched_dep_time <= 1200) %>% 
  group_by(carrier, origin) %>% 
  mutate(q3 = quantile(dep_delay, probs = 0.75, na.rm = T), 
         rank = rank(desc(q3)),
         top_2 = ifelse(rank %in% c(1,2), TRUE, FALSE)) %>% 
  View()

Is the problem that you don't always manage to identify the bottom two carriers per origin? One thing that's happening is that rank isn't always returning an integer because of ties. You could change that by specifying the ties.method argument in rank, or the way I've done it below is to order the dataset and use row_number().


# Find bottom two carriers for each origin
q3rank = flights %>% 
  filter(sched_dep_time <= 1200) %>% 
  group_by(carrier, origin) %>% 
  summarise(q3 = quantile(dep_delay, probs = 0.75, na.rm = T)) %>% 
  ungroup() %>% 
  arrange(origin, -q3) %>% 
  group_by(origin) %>% 
  mutate(top_2 = row_number() %in% 1:2) %>% 
  ungroup() 

# Join indicator for bottom two back to flights dataset
flights %>% 
  filter(sched_dep_time <= 1200) %>% 
  inner_join(q3rank)
2 Likes

This works perfectly! That's what I was really trying to get at. Thank you so much!

1 Like