Tidyverse ggplot() help: How to segment each column in the geom_bar()

It just occured to me that you might be describing a grouped, rather than a stacked barplot. You can see examples of both with code here:
https://www.r-graph-gallery.com/211-basic-grouped-or-stacked-barplot/

Mara:
Thank you for the info.

I still have not gathered enough input to plot the segment in each "category". After you and Curtis asked me to use reprex() and many other requests, I hope everyone can give me some helpful advice.
Everyone is busy and I understand that but I did learn how to use reprex() and other members seem not to get back to me about this issue.
Why did you ask me to use the reprex() result at the first place?

I am curious.

Thank you.

Because it allows someone else to cut and paste and then help you fix your code, e.g. when I cut and paste the reprex from above, I automatically get the below (n.b. I removed the problematic labelling line, to illustrate that reprex automatically renders the image to imgur so it is all pasted in just by pasting here in Markdown.

library(tidyverse) 
df3a <- structure(list(category = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
                                             1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
                                             2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
                                             4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 
                                             5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 
                                             6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
                                             7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 
                                             9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L), .Label = c("0%", 
                                                                                                     "1%-10%", "11%-20%", "21%-30%", "31%-40%", "41%-50%", "51%-60%", 
                                                                                                     "61%-70%", ">= 71%"), class = "factor"), PROGRAM_LEVEL_DESCR = structure(c(1L, 
                                                                                                                                                                                2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 1L, 
                                                                                                                                                                                2L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 14L, 1L, 2L, 3L, 4L, 6L, 
                                                                                                                                                                                7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 1L, 2L, 3L, 4L, 6L, 7L, 
                                                                                                                                                                                9L, 10L, 11L, 12L, 13L, 14L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
                                                                                                                                                                                9L, 10L, 11L, 12L, 13L, 14L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 
                                                                                                                                                                                11L, 12L, 14L, 1L, 2L, 3L, 4L, 6L, 7L, 9L, 10L, 11L, 12L, 13L, 
                                                                                                                                                                                14L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 14L, 
                                                                                                                                                                                1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L), .Label = c("Branch Refusal", 
                                                                                                                                                                                                                                                     "Club", "Corporate Refusal", "Credit Hold", "Customer Refusal", 
                                                                                                                                                                                                                                                     "Diamond", "Enrollment", "Failed 2X in Calendar Year", "Gold", 
                                                                                                                                                                                                                                                     "Institutional", "No Program", "Platinum", "RSVP", "Silver"), class = "factor"), 
                      count = c(133L, 172L, 5L, 215L, 1L, 104L, 389L, 13L, 843L, 
                                193L, 10743L, 482L, 10L, 1695L, 3L, 383L, 59L, 471L, 98L, 
                                2L, 1675L, 87L, 1284L, 1719L, 1351L, 6L, 290L, 3L, 39L, 262L, 
                                85L, 3L, 1123L, 76L, 1255L, 1003L, 1L, 1000L, 3L, 208L, 5L, 
                                31L, 189L, 69L, 731L, 79L, 979L, 670L, 1L, 732L, 1L, 156L, 
                                8L, 33L, 1L, 127L, 70L, 1L, 547L, 55L, 967L, 480L, 1L, 568L, 
                                150L, 5L, 31L, 85L, 65L, 2L, 416L, 38L, 907L, 319L, 531L, 
                                1L, 102L, 14L, 18L, 63L, 35L, 307L, 25L, 533L, 236L, 2L, 
                                317L, 3L, 90L, 18L, 22L, 1L, 33L, 38L, 1L, 254L, 25L, 640L, 
                                180L, 275L, 8L, 179L, 48L, 76L, 100L, 150L, 5L, 503L, 95L, 
                                4032L, 339L, 2L, 812L)), class = c("grouped_df", "tbl_df", 
                                                                   "tbl", "data.frame"), row.names = c(NA, -113L), vars = "category", drop = TRUE)
df3a %>% 
  ggplot(aes(x=category, y=count)) + 
  geom_bar(aes(fill = PROGRAM_LEVEL_DESCR),stat='identity') +
  labs(y='Number of Distinct Customers', x=' # of PL Orders in the PL Cart')


Created on 2018-06-30 by the reprex package (v0.2.0).

I asked about the grouped bar chart, because, going through your question, I wanted to make sure that wasn't a piece that the rest of us were somehow missing. I wanted to ask that before moving forward to help steer you in the right direction.

I am trying to figure out exactly what you mean by the above.

Looking at it, given the number of PROGRAM_LEVEL_DESCR, I'm guessing not.

From your code above, this section will not work because you're referencing a variable, "percent", which you have not defined.

geom_text(aes(label=sprintf("%1.1f%%", percent)), 
            position=position_stack(vjust=0.5), size=3, colour="white")

Where Joel referenced this above, it was a continuation of the chunk above where you had created the variable percent:

df2a1 <- df2a %>% 
  group_by(category, PROGRAM_LEVEL_DESCR) %>% 
  summarise(count=n()) %>% 
  mutate(percent= paste0(round(count/sum(count)*100,1),'%'))

I don't think you have that variable in your reprex. I'm not totally clear on what the method above is intended to do, so I didn't add a labeller.

1 Like

Mara:
In terms of segment, this is perfect. That is what I meant.
e950c55e88a85f1cbe1aeb1c38aba35f47d86e4b_1_690x483
In the 0% category, there are 12 groups.
How do I plot these groups within category 0%? This is perfect. Thank you.

In terms of illustration, is there a way to show percentage in each column in, for example, 0% category?
I agree that when you look at 0% category, "No Program" has the highest height, but I want to show to my audience the percentage number.
What do you think? Mara.
Do I even need number? Is the plot self-explanatory?
If I present this to you, would you like my visualization?

This is a grouped bar plot. This is what I was describing above, and why I asked you to take a look at that link.

Modifying examples is a great way to learn. Moreover, it's a great way to communicate about what it is you're trying to achieve. The R Graph Gallery (see link in post above) is a really nice resource for doing that.

I'm not sure what you mean by the above. Do you only want the 0% category?

If so, you can filter your dataset (pseudocode below, as I'm not sure which data frame you're working off of — the one from the reprex doesn't have percent, so I'm assuming not that one):

df_zero <- data %>%
  filter(category == "0%")

Below I'm going through each step of creating a string more explicitly than you would in reality, but I want to make sure it's clear what I'm doing — looking at the percentage represented by each category within the subset of customers who are in the 0% category).

library(tidyverse)
df3 <- data.frame(stringsAsFactors=FALSE,
                  category = c("0%", "0%", "0%", "0%", "0%", "0%", "0%", "0%",
                               "0%", "0%", "0%", "0%", "0%", "0%", "1%-10%",
                               "1%-10%", "1%-10%", "1%-10%", "1%-10%", "1%-10%", "1%-10%",
                               "1%-10%", "1%-10%", "1%-10%", "1%-10%", "11%-20%",
                               "11%-20%", "11%-20%", "11%-20%", "11%-20%", "11%-20%",
                               "11%-20%", "11%-20%", "11%-20%", "11%-20%", "11%-20%",
                               "11%-20%", "11%-20%", "21%-30%", "21%-30%", "21%-30%",
                               "21%-30%", "21%-30%", "21%-30%", "21%-30%", "21%-30%",
                               "21%-30%", "21%-30%", "21%-30%", "21%-30%",
                               "31%-40%", "31%-40%", "31%-40%", "31%-40%", "31%-40%",
                               "31%-40%", "31%-40%", "31%-40%", "31%-40%", "31%-40%",
                               "31%-40%", "31%-40%", "31%-40%", "31%-40%", "41%-50%",
                               "41%-50%", "41%-50%", "41%-50%", "41%-50%", "41%-50%",
                               "41%-50%", "41%-50%", "41%-50%", "41%-50%", "41%-50%",
                               "51%-60%", "51%-60%", "51%-60%", "51%-60%", "51%-60%",
                               "51%-60%", "51%-60%", "51%-60%", "51%-60%", "51%-60%",
                               "51%-60%", "51%-60%", "61%-70%", "61%-70%", "61%-70%",
                               "61%-70%", "61%-70%", "61%-70%", "61%-70%", "61%-70%",
                               "61%-70%", "61%-70%", "61%-70%", "61%-70%", "61%-70%",
                               ">= 71%", ">= 71%", ">= 71%", ">= 71%", ">= 71%",
                               ">= 71%", ">= 71%", ">= 71%", ">= 71%", ">= 71%",
                               ">= 71%", ">= 71%", ">= 71%"),
                  PROGRAM_LEVEL_DESCR = c("Branch Refusal", "Club", "Corporate Refusal",
                                          "Credit Hold", "Customer Refusal", "Diamond",
                                          "Enrollment", "Failed 2X in Calendar Year", "Gold",
                                          "Institutional", "No Program", "Platinum", "RSVP", "Silver",
                                          "Branch Refusal", "Club", "Credit Hold", "Diamond",
                                          "Enrollment", "Failed 2X in Calendar Year", "Gold",
                                          "Institutional", "No Program", "Platinum", "Silver",
                                          "Branch Refusal", "Club", "Corporate Refusal",
                                          "Credit Hold", "Diamond", "Enrollment",
                                          "Failed 2X in Calendar Year", "Gold", "Institutional", "No Program", "Platinum",
                                          "RSVP", "Silver", "Branch Refusal", "Club",
                                          "Corporate Refusal", "Credit Hold", "Diamond", "Enrollment",
                                          "Gold", "Institutional", "No Program", "Platinum",
                                          "RSVP", "Silver", "Branch Refusal", "Club",
                                          "Corporate Refusal", "Credit Hold", "Customer Refusal", "Diamond",
                                          "Enrollment", "Failed 2X in Calendar Year", "Gold",
                                          "Institutional", "No Program", "Platinum", "RSVP",
                                          "Silver", "Club", "Corporate Refusal", "Credit Hold",
                                          "Diamond", "Enrollment", "Failed 2X in Calendar Year", "Gold",
                                          "Institutional", "No Program", "Platinum", "Silver",
                                          "Branch Refusal", "Club", "Corporate Refusal",
                                          "Credit Hold", "Diamond", "Enrollment", "Gold",
                                          "Institutional", "No Program", "Platinum", "RSVP", "Silver",
                                          "Branch Refusal", "Club", "Corporate Refusal",
                                          "Credit Hold", "Customer Refusal", "Diamond", "Enrollment",
                                          "Failed 2X in Calendar Year", "Gold", "Institutional",
                                          "No Program", "Platinum", "Silver", "Branch Refusal",
                                          "Club", "Corporate Refusal", "Credit Hold", "Diamond",
                                          "Enrollment", "Failed 2X in Calendar Year", "Gold",
                                          "Institutional", "No Program", "Platinum", "RSVP", "Silver"),
                  count = c(133, 172, 5, 215, 1, 104, 389, 13, 843, 193, 10743,
                            482, 10, 1695, 3, 383, 59, 471, 98, 2, 1675, 87,
                            1284, 1719, 1351, 6, 290, 3, 39, 262, 85, 3, 1123, 76,
                            1255, 1003, 1, 1000, 3, 208, 5, 31, 189, 69, 731, 79,
                            979, 670, 1, 732, 1, 156, 8, 33, 1, 127, 70, 1, 547, 55,
                            967, 480, 1, 568, 150, 5, 31, 85, 65, 2, 416, 38,
                            907, 319, 531, 1, 102, 14, 18, 63, 35, 307, 25, 533, 236,
                            2, 317, 3, 90, 18, 22, 1, 33, 38, 1, 254, 25, 640,
                            180, 275, 8, 179, 48, 76, 100, 150, 5, 503, 95, 4032,
                            339, 2, 812)
)

df_zero <- df3 %>%
  filter(category == "0%")

total <- sum(df_zero$count)

df_zero <- df_zero %>%
  mutate(pct_within_cat = (count / total) * 100) %>%
  mutate(pct_rounded = round(pct_within_cat, digits = 2)) %>%
  mutate(pct_string = str_glue("{pct_rounded}%"))

head(df_zero)
#>   category PROGRAM_LEVEL_DESCR count pct_within_cat pct_rounded pct_string
#> 1       0%      Branch Refusal   133    0.886784905        0.89      0.89%
#> 2       0%                Club   172    1.146819576        1.15      1.15%
#> 3       0%   Corporate Refusal     5    0.033337778        0.03      0.03%
#> 4       0%         Credit Hold   215    1.433524470        1.43      1.43%
#> 5       0%    Customer Refusal     1    0.006667556        0.01      0.01%
#> 6       0%             Diamond   104    0.693425790        0.69      0.69%

df_zero %>%
  ggplot(aes(x = PROGRAM_LEVEL_DESCR, y = count, fill = PROGRAM_LEVEL_DESCR)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = pct_string))

Created on 2018-06-30 by the reprex package (v0.2.0).

Obviously you'd want to play around with aesthetics if this is what you're aiming to do (rotating x-axis labels, resizing the percentage labels, positioning them, etc.)

Certainly you'd need to change the aspect ratio if you want horizontal labels. Another idea is to use coord_flip(), which you can look up in the ggplot2 docs by running ?coord_flip() when you have the ggplot2 library loaded in R.


If you want something like this for each customer percentage category, you'd probably want to add faceting.

2 Likes

Mara:
Thank you for everything.
I appreciate your input!

I hope there was no hurt feeling there about my comments above.
It was all about question and answer.
No frustration, no anger, no hard feeling from my end.
I just wanted to clarify that.

Thank you, once again!