Slicing top 5 for each year in facet_wrap plot and ordering them.

Hi there,

I'm having a little trouble figuring out how to use the slice_max() or top_n() to get the top 5 customers for each year and plot them independently in each facet_wrap by year.

I've made up the below data set to illustrate what I am trying to achieve.

library("tidyverse","tidytext")

# Sample data
df_sample <- tribble(
  ~custname,               ~Act.Inv.Year, ~Tot.Billed.Amount,
  "COMPANY BOSTON",             2019,                   425128,
  "COMPANY BOSTON",             2020,                    26121,
  "COMPANY CHARLESTON",         2019,                 16280498,
  "COMPANY CHARLESTON",         2020,                 15564767,
  "COMPANY CHARLESTON",         2018,                   405844,
  "COMPANY CHARLOTTE",          2019,                  7838489,
  "COMPANY CHARLOTTE",          2020,                  3246329,
  "COMPANY CHARLOTTE",          2018,                  2011286,
  "COMPANY CHICAGO",            2019,                 39558510,
  "COMPANY CHICAGO",            2020,                 23725972,
  "COMPANY CHICAGO",            2018,                 18698035,
  "COMPANY CLEVELAND",          2019,                  6657149, 
  "COMPANY CLEVELAND",          2020,                  1536365,
  "COMPANY CLEVELAND",          2018,                   379691,
  "COMPANY DALLAS",             2019,                 15143584,
  "COMPANY DALLAS",             2020,                  8559333,
  "COMPANY DETROIT",            2019,                  1408708,
  "COMPANY DETROIT",            2020,                  2197626,
  "COMPANY HOUSTON",            2019,                  3519787, 
  "COMPANY HOUSTON",            2020,                  8169534,
  "COMPANY HOUSTON",            2018,                   241019,
  "COMPANY HOUSTON INDUSTRIAL", 2019,                    73425,
  "COMPANY INDIANAPOLIS",       2019,                   282315, 
  "COMPANY INDIANAPOLIS",       2020,                  4944882,
  "COMPANY LOS ANGELES",        2019,                 15708881,
  "COMPANY LOS ANGELES",        2020,                 18918531,
  "COMPANY LOS ANGELES",        2018,                   686162,
  "COMPANY MIAMI",              2019,                   147892,
  "COMPANY MIAMI",              2020,                   352303,
  "COMPANY MIAMI",              2018,                     5410,
  "COMPANY MINNEAPOLIS",         2019,                 15748282,
  "COMPANY MINNEAPOLIS",        2020,                 12156385,
  "COMPANY MINNEAPOLIS",        2018,                  1227781,
  "COMPANY NEWARK",             2019,                 39410764,
  "COMPANY NEWARK",             2020,                 21114220,
  "COMPANY NEWARK",             2018,                 15915941,
  "COMPANY PEORIA",             2019,                  4681765,
  "COMPANY PEORIA",             2020,                  5687418,
  "COMPANY PEORIA",             2018,                    49056,
  "COMPANY PORTLAND",           2019,                    36384,
  "COMPANY PORTLAND",           2020,                    72999,
  "COMPANY ROCHESTER",          2019,                   678562,
  "COMPANY ROCHESTER",          2020,                   896877,
  "COMPANY SAN FRANCISCO",      2019,                   705590,
  "COMPANY SAN FRANCISCO",      2020,                   507767,
  "COMPANY SEATTLE",            2019,                  2307699,
  "COMPANY SEATTLE",            2020,                  2020438,
  "COMPANY SEATTLE",            2018,                    39603,
  "MISSING NAME",           2019,                   253207,
  "MISSING NAME",           2020,                   215389,
  "MISSING NAME",           2018,                   143926,
  "NULL",                   2018,                   490972,
)

Below is the desired general form of the plot I was going for, but I want to restrict to a sub-set of top N (5 in this case) profit customers. In other words, top 5 profitable customers per year.

# Desired Plot:
df_sample %>% 
  mutate(
    `Act.Inv.Year` = as.factor(`Act.Inv.Year`),
    `custname` = reorder_within(`custname`,`Tot.Billed.Amount`,`Act.Inv.Year`)
  ) %>% ggplot(aes(`custname`,`Tot.Billed.Amount`, fill = `Act.Inv.Year`)) +
  geom_col() +
  scale_x_continuous(labels = dollar) +
  theme(legend.position="none") +
  facet_wrap(~`Act.Inv.Year`, scales = "free_y") +
  coord_flip()  +
  scale_x_reordered() +
  scale_y_continuous(expand = c(0,0)) +
  labs(y = "Total Profit $",
       x = "Customer Name")

I then tried to incorporate slice_max() with no success:"

# Slice 5 - Getting top 5 overall not per year as desired.
df_sample %>% 
  mutate(
    `Act.Inv.Year` = as.factor(`Act.Inv.Year`),
    `custname` = reorder_within(`custname`,`Tot.Billed.Amount`,`Act.Inv.Year`)
  ) %>% 
  slice_max(`Tot.Billed.Amount`, n = 5 ) %>% 
  ggplot(aes(`custname`,`Tot.Billed.Amount`, fill = `Act.Inv.Year`)) +
  geom_col() +
  scale_x_continuous(labels = dollar) +
  theme(legend.position="none") +
  facet_wrap(~`Act.Inv.Year`, scales = "free_y") +
  coord_flip()  +
  scale_x_reordered() +
  scale_y_continuous(expand = c(0,0)) +
  labs(y = "Total Profit $",
       x = "Customer Name")

Then I tried increasing the n value to see what happens:

# Slice 10 - Getting top 10 overall not per year as desired.
df_sample %>% 
  mutate(
    `Act.Inv.Year` = as.factor(`Act.Inv.Year`),
    `custname` = reorder_within(`custname`,`Tot.Billed.Amount`,`Act.Inv.Year`)
  ) %>% 
  slice_max(`Tot.Billed.Amount`, n =10 ) %>% 
  ggplot(aes(`custname`,`Tot.Billed.Amount`, fill = `Act.Inv.Year`)) +
  geom_col() +
  scale_x_continuous(labels = dollar) +
  theme(legend.position="none") +
  facet_wrap(~`Act.Inv.Year`, scales = "free_y") +
  coord_flip()  +
  scale_x_reordered() +
  scale_y_continuous(expand = c(0,0)) +
  labs(y = "Total Profit $",
       x = "Customer Name")

That is when I noticed that I was getting the total output plots equal to the size of my slice, while what I was going for was the top 5 in this case per year.

Any thoughts?

Ty

Best practice in my opinion is separating data transformations from plotting instructions.
Use dplyr group by and slice to make a dataframe which more closely conforms to what you want to eventually plot. Then pass to ggplot2 explicitly. You are just making it harder for yourself by an implicit pipe, you can't easily see what your dplyr transformations have done and what ggplot2 will be receiving.

Also use group_by

1 Like

Finally was able to get the result I was going for.

df_sample %>%  group_by(`Act.Inv.Year`) %>% 
  mutate(
    `Act.Inv.Year` = as.factor(`Act.Inv.Year`),
    `custname` = reorder_within(`custname`,`Tot.Billed.Amount`,`Act.Inv.Year`)
  ) %>% 
  slice_max(`Tot.Billed.Amount`, n =5) %>% 
  ggplot(aes(`custname`,`Tot.Billed.Amount`, fill = `Act.Inv.Year`)) +
  geom_col() +
  # scale_x_continuous(labels = dollar) +
  theme(legend.position="none") +
  facet_wrap(~`Act.Inv.Year`, scales = "free_y") +
  coord_flip()  +
  scale_x_reordered() +
  scale_y_continuous(expand = c(0,0), labels = dollar) +
  labs(y = "Total Profit $",
       x = "Customer Name")

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.