@joels
Joel:
I got the plot but my thought on the 0.0% error is as follow:


Is there a way for me to give percentage of each Program_level_Desr in category 0%?
Then we can fix the 0.0% error.
Am I clear enough?
For example, for the "0%" category, there are 14 different levels. I want to see the percentage for each of them in the "0%" category. So for the first one would be 133/total *100.

Thanks!

It will be much easier to troubleshoot this if you can put it in a reproducible example, which you can do with reprex. That way your input, output, and charts will all be self-contained so anyone can just copy and paste to replicate exactly what you're doing.

You can install reprex, as shown below.

install.reprex("reprex")
What to do if you run into clipboard problems

If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.

reprex::reprex(input = "fruits_stringdist.R", outfile = "fruits_stringdist.md")

For pointers specific to the community site, as well as several helpful resources for learning more about reprex, check out the reprex FAQ, linked to below.

If I understand you correctly, I think you can do this with a group_by(category) and mutate(prct_within_cat = count / sum(count))
And then in your plot geom_text, refer to prct_within_cat instead of percent*100

To Clarify my post:


Recall that each customer has a unique Rewards/Advantage level.
So for the first category: "0%", there are 14,998 customers who do not buy PL, and they occupy about 32.5% out of the customers pool.

Next, I want to dig deeply into this number. If you want to do a marketing campaign and you want to target the RIGHT audiences, you need to see the segmentation within each column. So long story short, I want to see in this 32.5%, how many of them are "Silver" customers, how many of them are "Non-Program" customers.

Why? Because I need to go out and talk to them. "Hey, why don't you buy PL? you are diamond or silver customers. You have stayed with us for a long time. Did PL products disappoint you? If so, tell me more about it..."

To go back to your question, it can be similarly answered to the info above. For column "1 PL order", I want to see the segments in it. "
Hey, why don't you buy more than 1 PL? You are diamond or silver customers. You have stayed with us for a long time. Did PL products disappoint you? If so, tell me more about it... Is the price too expensive? Would you interested in buying more than one in the future...."

Am I making sense?

> df2a2 <- df2a %>% group_by(category) %>% summarise(count=n()) %>% 
+   mutate(percent = round(count/sum(count)*100))

> plot_df2a2 = ggplot(df2a2) + geom_bar(aes(x=category,y=count, fill = PROGRAM_LEVEL_DESCR),stat='identity') + 
+   labs(y='Number of Distinct Customers',x=' # of PL Orders in the PL Cart')+ 
+   geom_text(aes(x=category,y=count,label=percent),vjust=-0.5)
> plot_df2a2

'Error: Aesthetics must be either length 1 or the same as the data (9): x, y, fill

The point of a reprex is that it's self-contained so that rather than describing what's in the data, others can see a sample of the data and help you modify your code, etc.

Jenny describes it really well in this video (starts ~10:40).

1 Like

Reprex output:

library(tidyverse)

df3 = data %>% filter(QtySold > 0L, Sales > 0L) %>%
  group_by(CUSTOMER_NUMBER,PRODUCT_SUB_LINE_DESCR,PROGRAM_LEVEL_DESCR ) %>% 
  summarise(Quants = sum(QtySold )) %>%
  ungroup() %>%
  spread(PRODUCT_SUB_LINE_DESCR,Quants,fill=0) %>%
  mutate(Total_Orders = `PRIVATE LABEL` + SUNDRY + Handpieces,
         PL_Order_Percentage= round((`PRIVATE LABEL` / Total_Orders) * 100),
         category = cut(PL_Order_Percentage,breaks = c(0,1,11,21,31,41,51,61,71,Inf), 
                        labels = c('0%','1%-10%','11%-20%',
                                   '21%-30%','31%-40%','41%-50%',
                                   '51%-60%','61%-70%','>= 71%'),include.lowest = T,right = F)
  ) 
#> Error in UseMethod("filter_"): no applicable method for 'filter_' applied to an object of class "function"

df3a <- df3 %>% 
  group_by(category, PROGRAM_LEVEL_DESCR) %>% 
  summarise(count=n(),
            percent = count/sum(count))
#> Error in eval(lhs, parent, parent): object 'df3' not found

df3a %>% 
  ggplot(aes(x=category, y=count)) + 
  geom_bar(aes(fill = PROGRAM_LEVEL_DESCR),stat='identity') +
  labs(y='Number of Distinct Customers', x=' # of PL Orders in the PL Cart') +
  geom_text(aes(label=sprintf("%1.1f%%", percent)), 
            position=position_stack(vjust=0.5), size=3, colour="white")
#> Error in eval(lhs, parent, parent): object 'df3a' not found

Created on 2018-06-29 by the reprex
package
(v0.2.0).

2 Likes

Thanks for giving reprex() a try. You’re almost there.

This error:
#> Error in UseMethod("filter_"): no applicable method for 'filter_' applied to an object of class "function"

is appearing (and causing the cascade of other errors) because you haven’t included code that creates your data in the chunk of code that you applied reprex() to, so R thinks data refers to a built-in R function by that name. reprex() runs in its own separate R session, so your code really does need to be self-contained — it can’t access objects in the session and environment that you called reprex() from.

The quickest path to a functional self-contained example here is to choose one of the methods in this thread to use to include code that creates a sample data object named data in your reprex()ed chunk.

1 Like

@jcblum
Reprex trial #2

Body is limited to 32000 characters; you entered 72041.

I am super confused. Everyone posts comments, guidelines, etc. but it does not mean the readers understand completely.
This is just my opinion!

If your dataset is that big, the next steps are:

  1. Consider whether your question can be answered with a subset or random sample of your data (the answer is almost always yes...). If so, provide this instead of your whole data set.

  2. If your question really relies on having the whole dataset, you’ll need to post that part of the code separately, as a github gist or similar (see here: (How to upload or share data files here)

I know that figuring out how to pose your questions this way has a learning curve and that can be frustrating when you just want to get to the solving-your-problem part. But once you’ve wrapped your head around it, there are major benefits. You get a clearer picture of your problem by structuring your question in a self-contained, minimally complex way. You spend less time going back and forth with your helpers trying to explain what you mean. More people want to help with your questions because they’re easier and more fun to dig into.

And like with everything else, it’s totally ok to be confused and make mistakes as you go along! No judgement from me as long as you’re making the effort.

2 Likes

@jcblum

library(tidyverse) 
library(scales) 
#> 
#> Attaching package: 'scales'
#> The following object is masked from 'package:purrr':
#> 
#>     discard
#> The following object is masked from 'package:readr':
#> 
#>     col_factor
library(cowplot) 
#> 
#> Attaching package: 'cowplot'
#> The following object is masked from 'package:ggplot2':
#> 
#>     ggsave
library(dplyr)
library(ggplot2)
library(reprex)
library("datapasta")
df3a = structure(list(category = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
                                             1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
                                             2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
                                             4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 
                                             5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 
                                             6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
                                             7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 
                                             9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L), .Label = c("0%", 
                                                                                                     "1%-10%", "11%-20%", "21%-30%", "31%-40%", "41%-50%", "51%-60%", 
                                                                                                     "61%-70%", ">= 71%"), class = "factor"), PROGRAM_LEVEL_DESCR = structure(c(1L, 
                                                                                                                                                                                2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 1L, 
                                                                                                                                                                                2L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 14L, 1L, 2L, 3L, 4L, 6L, 
                                                                                                                                                                                7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 1L, 2L, 3L, 4L, 6L, 7L, 
                                                                                                                                                                                9L, 10L, 11L, 12L, 13L, 14L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
                                                                                                                                                                                9L, 10L, 11L, 12L, 13L, 14L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 
                                                                                                                                                                                11L, 12L, 14L, 1L, 2L, 3L, 4L, 6L, 7L, 9L, 10L, 11L, 12L, 13L, 
                                                                                                                                                                                14L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 14L, 
                                                                                                                                                                                1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L), .Label = c("Branch Refusal", 
                                                                                                                                                                                                                                                     "Club", "Corporate Refusal", "Credit Hold", "Customer Refusal", 
                                                                                                                                                                                                                                                     "Diamond", "Enrollment", "Failed 2X in Calendar Year", "Gold", 
                                                                                                                                                                                                                                                     "Institutional", "No Program", "Platinum", "RSVP", "Silver"), class = "factor"), 
                      count = c(133L, 172L, 5L, 215L, 1L, 104L, 389L, 13L, 843L, 
                                193L, 10743L, 482L, 10L, 1695L, 3L, 383L, 59L, 471L, 98L, 
                                2L, 1675L, 87L, 1284L, 1719L, 1351L, 6L, 290L, 3L, 39L, 262L, 
                                85L, 3L, 1123L, 76L, 1255L, 1003L, 1L, 1000L, 3L, 208L, 5L, 
                                31L, 189L, 69L, 731L, 79L, 979L, 670L, 1L, 732L, 1L, 156L, 
                                8L, 33L, 1L, 127L, 70L, 1L, 547L, 55L, 967L, 480L, 1L, 568L, 
                                150L, 5L, 31L, 85L, 65L, 2L, 416L, 38L, 907L, 319L, 531L, 
                                1L, 102L, 14L, 18L, 63L, 35L, 307L, 25L, 533L, 236L, 2L, 
                                317L, 3L, 90L, 18L, 22L, 1L, 33L, 38L, 1L, 254L, 25L, 640L, 
                                180L, 275L, 8L, 179L, 48L, 76L, 100L, 150L, 5L, 503L, 95L, 
                                4032L, 339L, 2L, 812L)), class = c("grouped_df", "tbl_df", 
                                                                   "tbl", "data.frame"), row.names = c(NA, -113L), vars = "category", drop = TRUE)
df3a %>% 
  ggplot(aes(x=category, y=count)) + 
  geom_bar(aes(fill = PROGRAM_LEVEL_DESCR),stat='identity') +
  labs(y='Number of Distinct Customers', x=' # of PL Orders in the PL Cart') +
  geom_text(aes(label=sprintf("%1.1f%%", percent)), 
            position=position_stack(vjust=0.5), size=3, colour="white")
#> Error in as.double(function (x) : cannot coerce type 'closure' to vector of type 'double'

Created on 2018-06-29 by the reprex
package
(v0.2.0).

@mara
To clarify your comment, my post above was meant to describe my goal i.e what I wanted with the data. I by no means tried to describe what the data is.

Right, my point was really that it is much easier to help you (or anyone) when we have a self-contained, reproducible example. Screenshots of what's in the data are not nearly so helpful as what you put:


since, with the latter, anyone can literally copy and paste and run your code to see if their suggested changes make the difference you've described or help you achieve your aim.

1 Like

Mara:
Any input how to solve this problem?

Thanks!

It just occured to me that you might be describing a grouped, rather than a stacked barplot. You can see examples of both with code here:

Mara:
Thank you for the info.

I still have not gathered enough input to plot the segment in each "category". After you and Curtis asked me to use reprex() and many other requests, I hope everyone can give me some helpful advice.
Everyone is busy and I understand that but I did learn how to use reprex() and other members seem not to get back to me about this issue.
Why did you ask me to use the reprex() result at the first place?

I am curious.

Thank you.

Because it allows someone else to cut and paste and then help you fix your code, e.g. when I cut and paste the reprex from above, I automatically get the below (n.b. I removed the problematic labelling line, to illustrate that reprex automatically renders the image to imgur so it is all pasted in just by pasting here in Markdown.

library(tidyverse) 
df3a <- structure(list(category = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
                                             1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
                                             2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
                                             4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 
                                             5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 
                                             6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
                                             7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 
                                             9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L), .Label = c("0%", 
                                                                                                     "1%-10%", "11%-20%", "21%-30%", "31%-40%", "41%-50%", "51%-60%", 
                                                                                                     "61%-70%", ">= 71%"), class = "factor"), PROGRAM_LEVEL_DESCR = structure(c(1L, 
                                                                                                                                                                                2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 1L, 
                                                                                                                                                                                2L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 14L, 1L, 2L, 3L, 4L, 6L, 
                                                                                                                                                                                7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 1L, 2L, 3L, 4L, 6L, 7L, 
                                                                                                                                                                                9L, 10L, 11L, 12L, 13L, 14L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
                                                                                                                                                                                9L, 10L, 11L, 12L, 13L, 14L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 
                                                                                                                                                                                11L, 12L, 14L, 1L, 2L, 3L, 4L, 6L, 7L, 9L, 10L, 11L, 12L, 13L, 
                                                                                                                                                                                14L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 14L, 
                                                                                                                                                                                1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L), .Label = c("Branch Refusal", 
                                                                                                                                                                                                                                                     "Club", "Corporate Refusal", "Credit Hold", "Customer Refusal", 
                                                                                                                                                                                                                                                     "Diamond", "Enrollment", "Failed 2X in Calendar Year", "Gold", 
                                                                                                                                                                                                                                                     "Institutional", "No Program", "Platinum", "RSVP", "Silver"), class = "factor"), 
                      count = c(133L, 172L, 5L, 215L, 1L, 104L, 389L, 13L, 843L, 
                                193L, 10743L, 482L, 10L, 1695L, 3L, 383L, 59L, 471L, 98L, 
                                2L, 1675L, 87L, 1284L, 1719L, 1351L, 6L, 290L, 3L, 39L, 262L, 
                                85L, 3L, 1123L, 76L, 1255L, 1003L, 1L, 1000L, 3L, 208L, 5L, 
                                31L, 189L, 69L, 731L, 79L, 979L, 670L, 1L, 732L, 1L, 156L, 
                                8L, 33L, 1L, 127L, 70L, 1L, 547L, 55L, 967L, 480L, 1L, 568L, 
                                150L, 5L, 31L, 85L, 65L, 2L, 416L, 38L, 907L, 319L, 531L, 
                                1L, 102L, 14L, 18L, 63L, 35L, 307L, 25L, 533L, 236L, 2L, 
                                317L, 3L, 90L, 18L, 22L, 1L, 33L, 38L, 1L, 254L, 25L, 640L, 
                                180L, 275L, 8L, 179L, 48L, 76L, 100L, 150L, 5L, 503L, 95L, 
                                4032L, 339L, 2L, 812L)), class = c("grouped_df", "tbl_df", 
                                                                   "tbl", "data.frame"), row.names = c(NA, -113L), vars = "category", drop = TRUE)
df3a %>% 
  ggplot(aes(x=category, y=count)) + 
  geom_bar(aes(fill = PROGRAM_LEVEL_DESCR),stat='identity') +
  labs(y='Number of Distinct Customers', x=' # of PL Orders in the PL Cart')


Created on 2018-06-30 by the reprex package (v0.2.0).

I asked about the grouped bar chart, because, going through your question, I wanted to make sure that wasn't a piece that the rest of us were somehow missing. I wanted to ask that before moving forward to help steer you in the right direction.

I am trying to figure out exactly what you mean by the above.

Looking at it, given the number of PROGRAM_LEVEL_DESCR, I'm guessing not.

From your code above, this section will not work because you're referencing a variable, "percent", which you have not defined.

geom_text(aes(label=sprintf("%1.1f%%", percent)), 
            position=position_stack(vjust=0.5), size=3, colour="white")

Where Joel referenced this above, it was a continuation of the chunk above where you had created the variable percent:

df2a1 <- df2a %>% 
  group_by(category, PROGRAM_LEVEL_DESCR) %>% 
  summarise(count=n()) %>% 
  mutate(percent= paste0(round(count/sum(count)*100,1),'%'))

I don't think you have that variable in your reprex. I'm not totally clear on what the method above is intended to do, so I didn't add a labeller.

1 Like

Mara:
In terms of segment, this is perfect. That is what I meant.
e950c55e88a85f1cbe1aeb1c38aba35f47d86e4b_1_690x483
In the 0% category, there are 12 groups.
How do I plot these groups within category 0%? This is perfect. Thank you.

In terms of illustration, is there a way to show percentage in each column in, for example, 0% category?
I agree that when you look at 0% category, "No Program" has the highest height, but I want to show to my audience the percentage number.
What do you think? Mara.
Do I even need number? Is the plot self-explanatory?
If I present this to you, would you like my visualization?

This is a grouped bar plot. This is what I was describing above, and why I asked you to take a look at that link.

Modifying examples is a great way to learn. Moreover, it's a great way to communicate about what it is you're trying to achieve. The R Graph Gallery (see link in post above) is a really nice resource for doing that.

I'm not sure what you mean by the above. Do you only want the 0% category?

If so, you can filter your dataset (pseudocode below, as I'm not sure which data frame you're working off of — the one from the reprex doesn't have percent, so I'm assuming not that one):

df_zero <- data %>%
  filter(category == "0%")

Below I'm going through each step of creating a string more explicitly than you would in reality, but I want to make sure it's clear what I'm doing — looking at the percentage represented by each category within the subset of customers who are in the 0% category).

library(tidyverse)
df3 <- data.frame(stringsAsFactors=FALSE,
                  category = c("0%", "0%", "0%", "0%", "0%", "0%", "0%", "0%",
                               "0%", "0%", "0%", "0%", "0%", "0%", "1%-10%",
                               "1%-10%", "1%-10%", "1%-10%", "1%-10%", "1%-10%", "1%-10%",
                               "1%-10%", "1%-10%", "1%-10%", "1%-10%", "11%-20%",
                               "11%-20%", "11%-20%", "11%-20%", "11%-20%", "11%-20%",
                               "11%-20%", "11%-20%", "11%-20%", "11%-20%", "11%-20%",
                               "11%-20%", "11%-20%", "21%-30%", "21%-30%", "21%-30%",
                               "21%-30%", "21%-30%", "21%-30%", "21%-30%", "21%-30%",
                               "21%-30%", "21%-30%", "21%-30%", "21%-30%",
                               "31%-40%", "31%-40%", "31%-40%", "31%-40%", "31%-40%",
                               "31%-40%", "31%-40%", "31%-40%", "31%-40%", "31%-40%",
                               "31%-40%", "31%-40%", "31%-40%", "31%-40%", "41%-50%",
                               "41%-50%", "41%-50%", "41%-50%", "41%-50%", "41%-50%",
                               "41%-50%", "41%-50%", "41%-50%", "41%-50%", "41%-50%",
                               "51%-60%", "51%-60%", "51%-60%", "51%-60%", "51%-60%",
                               "51%-60%", "51%-60%", "51%-60%", "51%-60%", "51%-60%",
                               "51%-60%", "51%-60%", "61%-70%", "61%-70%", "61%-70%",
                               "61%-70%", "61%-70%", "61%-70%", "61%-70%", "61%-70%",
                               "61%-70%", "61%-70%", "61%-70%", "61%-70%", "61%-70%",
                               ">= 71%", ">= 71%", ">= 71%", ">= 71%", ">= 71%",
                               ">= 71%", ">= 71%", ">= 71%", ">= 71%", ">= 71%",
                               ">= 71%", ">= 71%", ">= 71%"),
                  PROGRAM_LEVEL_DESCR = c("Branch Refusal", "Club", "Corporate Refusal",
                                          "Credit Hold", "Customer Refusal", "Diamond",
                                          "Enrollment", "Failed 2X in Calendar Year", "Gold",
                                          "Institutional", "No Program", "Platinum", "RSVP", "Silver",
                                          "Branch Refusal", "Club", "Credit Hold", "Diamond",
                                          "Enrollment", "Failed 2X in Calendar Year", "Gold",
                                          "Institutional", "No Program", "Platinum", "Silver",
                                          "Branch Refusal", "Club", "Corporate Refusal",
                                          "Credit Hold", "Diamond", "Enrollment",
                                          "Failed 2X in Calendar Year", "Gold", "Institutional", "No Program", "Platinum",
                                          "RSVP", "Silver", "Branch Refusal", "Club",
                                          "Corporate Refusal", "Credit Hold", "Diamond", "Enrollment",
                                          "Gold", "Institutional", "No Program", "Platinum",
                                          "RSVP", "Silver", "Branch Refusal", "Club",
                                          "Corporate Refusal", "Credit Hold", "Customer Refusal", "Diamond",
                                          "Enrollment", "Failed 2X in Calendar Year", "Gold",
                                          "Institutional", "No Program", "Platinum", "RSVP",
                                          "Silver", "Club", "Corporate Refusal", "Credit Hold",
                                          "Diamond", "Enrollment", "Failed 2X in Calendar Year", "Gold",
                                          "Institutional", "No Program", "Platinum", "Silver",
                                          "Branch Refusal", "Club", "Corporate Refusal",
                                          "Credit Hold", "Diamond", "Enrollment", "Gold",
                                          "Institutional", "No Program", "Platinum", "RSVP", "Silver",
                                          "Branch Refusal", "Club", "Corporate Refusal",
                                          "Credit Hold", "Customer Refusal", "Diamond", "Enrollment",
                                          "Failed 2X in Calendar Year", "Gold", "Institutional",
                                          "No Program", "Platinum", "Silver", "Branch Refusal",
                                          "Club", "Corporate Refusal", "Credit Hold", "Diamond",
                                          "Enrollment", "Failed 2X in Calendar Year", "Gold",
                                          "Institutional", "No Program", "Platinum", "RSVP", "Silver"),
                  count = c(133, 172, 5, 215, 1, 104, 389, 13, 843, 193, 10743,
                            482, 10, 1695, 3, 383, 59, 471, 98, 2, 1675, 87,
                            1284, 1719, 1351, 6, 290, 3, 39, 262, 85, 3, 1123, 76,
                            1255, 1003, 1, 1000, 3, 208, 5, 31, 189, 69, 731, 79,
                            979, 670, 1, 732, 1, 156, 8, 33, 1, 127, 70, 1, 547, 55,
                            967, 480, 1, 568, 150, 5, 31, 85, 65, 2, 416, 38,
                            907, 319, 531, 1, 102, 14, 18, 63, 35, 307, 25, 533, 236,
                            2, 317, 3, 90, 18, 22, 1, 33, 38, 1, 254, 25, 640,
                            180, 275, 8, 179, 48, 76, 100, 150, 5, 503, 95, 4032,
                            339, 2, 812)
)

df_zero <- df3 %>%
  filter(category == "0%")

total <- sum(df_zero$count)

df_zero <- df_zero %>%
  mutate(pct_within_cat = (count / total) * 100) %>%
  mutate(pct_rounded = round(pct_within_cat, digits = 2)) %>%
  mutate(pct_string = str_glue("{pct_rounded}%"))

head(df_zero)
#>   category PROGRAM_LEVEL_DESCR count pct_within_cat pct_rounded pct_string
#> 1       0%      Branch Refusal   133    0.886784905        0.89      0.89%
#> 2       0%                Club   172    1.146819576        1.15      1.15%
#> 3       0%   Corporate Refusal     5    0.033337778        0.03      0.03%
#> 4       0%         Credit Hold   215    1.433524470        1.43      1.43%
#> 5       0%    Customer Refusal     1    0.006667556        0.01      0.01%
#> 6       0%             Diamond   104    0.693425790        0.69      0.69%

df_zero %>%
  ggplot(aes(x = PROGRAM_LEVEL_DESCR, y = count, fill = PROGRAM_LEVEL_DESCR)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = pct_string))

Created on 2018-06-30 by the reprex package (v0.2.0).

Obviously you'd want to play around with aesthetics if this is what you're aiming to do (rotating x-axis labels, resizing the percentage labels, positioning them, etc.)

Certainly you'd need to change the aspect ratio if you want horizontal labels. Another idea is to use coord_flip(), which you can look up in the ggplot2 docs by running ?coord_flip() when you have the ggplot2 library loaded in R.


If you want something like this for each customer percentage category, you'd probably want to add faceting.
http://www.cookbook-r.com/Graphs/Facets_(ggplot2)/

2 Likes

Mara:
Thank you for everything.
I appreciate your input!

I hope there was no hurt feeling there about my comments above.
It was all about question and answer.
No frustration, no anger, no hard feeling from my end.
I just wanted to clarify that.

Thank you, once again!