Custom bar plot function with group and sum totals

Hello,

I'm trying to write a function to generate custom grouped bar plots which include group totals (achieved) and bar totals (not achieved).

Here is an example of what I’m hoping for:

How can I add the totals on the right edges of the bars?

I used the following code to produce a function giving me the group totals.

library(tidyverse)
library(rlang)
#> 
#> Attaching package: 'rlang'
#> The following objects are masked from 'package:purrr':
#> 
#>     %@%, as_function, flatten, flatten_chr, flatten_dbl,
#>     flatten_int, flatten_lgl, flatten_raw, invoke, list_along,
#>     modify, prepend, splice
my.dt <- data.frame(
        X = sample(c("Black","Blue","Red","Green","Violet"), 1000, replace = TRUE),
        Y = sample(c("Yes", "No","Maybe"), 1000, replace = TRUE)
)


bar_totals <- function(.data, vallue, group_var = Y) {
        vallue <- rlang::enquo(vallue)
        group_var <- rlang::enquo(group_var)
        .data %>%
                mutate(vallue = !!vallue %>% fct_infreq() %>% fct_rev()) %>%
                ggplot(aes(x=vallue,..count..,fill=!! group_var))+
                geom_bar()+
                geom_text(stat = "count", aes(label = ..count.., y = ..count..),position=position_stack(0.5))+
                theme(legend.position="top",
                      axis.title.y = element_blank(),
                      axis.title.x = element_blank(),
                      axis.ticks.y = element_blank())+
                labs(fill = "")+
                coord_flip()
} 

bar_totals(my.dt, X)


# The totals I would like to add to the grouped bar graph 

table(my.dt$X)
#> 
#>  Black   Blue  Green    Red Violet 
#>    178    216    209    204    193

Created on 2019-07-19 by the reprex package (v0.3.0)

I would really appreciate the help.

André

1 Like

Hi,

That's some serious funky code you have :slight_smile: I'm not an expert on all those fancy functions you're using, but I did find a way to solve the problem (I'm sure you can clean up the code even more as you seem to be an experienced coder)

OPTION 1 - Adding a label

bar_totals <- function(.data, vallue, group_var = Y) {
  vallue <- rlang::enquo(vallue)
  group_var <- rlang::enquo(group_var)
  .data %>%
    mutate(vallue = !!vallue %>% fct_infreq() %>% fct_rev()) %>%
    ggplot(aes(x=vallue,..count..,fill=!! group_var))+
    geom_bar()+
    geom_text(stat = "count", aes(label = ..count.., y = ..count..),position=position_stack(0.5))+
    geom_label(data = my.dt %>% group_by(X) %>% summarise(n()), 
              aes(x = X, label =  `n()`, y = `n()`), fill = "white", size = 10)+
    theme(legend.position="top",
          axis.title.y = element_blank(),
          axis.title.x = element_blank(),
          axis.ticks.y = element_blank())+
    labs(fill = "")+
    coord_flip()
} 

OPTION 2 - Adding text

bar_totals <- function(.data, vallue, group_var = Y) {
  vallue <- rlang::enquo(vallue)
  group_var <- rlang::enquo(group_var)
  .data %>%
    mutate(vallue = !!vallue %>% fct_infreq() %>% fct_rev()) %>%
    ggplot(aes(x=vallue,..count..,fill=!! group_var))+
    geom_bar()+
    geom_text(stat = "count", aes(label = ..count.., y = ..count..),position=position_stack(0.5))+
    geom_text(data = my.dt %>% group_by(X) %>% summarise(n()), 
              aes(x = X, label =  `n()`, y = `n()`, fill = NULL), size = 10)+
    theme(legend.position="top",
          axis.title.y = element_blank(),
          axis.title.x = element_blank(),
          axis.ticks.y = element_blank())+
    labs(fill = "")+
    coord_flip()
} 

** I had to fiddle with the fill variable in the geom_text because for some reason it got inherited from above I think. Also, setting it to NA produced and error, setting it to NULL did not ...

Let me know what you think...
PJ

1 Like

I would probably make a new data.frame with the labels within your function.

You can get a simple frequency data.frame by using data.frame() on a table.

data.frame( table(my.dt$X) )
    Var1 Freq
1  Black  200
2   Blue  190
3  Green  202
4    Red  199
5 Violet  209

Since you are passing bare variable names to the function, we need to convert the vallue variable to a string to extract the correct column from the dataset within the function. I do this with as_label() after using enquo().

The resulting data.frame can be used in a new geom_text() layer.

For my own edification I used curly-curly, {{, for the fill variable instead of enquo()/!!, which is only available in the newest version of rlang. I also used the relatively new stat() function instead of the ..count.. coding

bar_totals = function(.data, vallue, group_var = Y) {
    vallue = enquo(vallue)
    val = as_label(vallue)
    bar_labs = data.frame( table(.data[[val]] ) )
    .data %>%
        mutate(vallue = !! vallue %>% fct_infreq() %>% fct_rev() ) %>%
        ggplot(aes(x = vallue, 
                   y = stat(count), 
                   fill= {{ group_var }})) +
        geom_bar() +
        geom_text(stat = "count",
                  aes(label = stat(count),
                      y = stat(count) ),
                  position = position_stack(0.5) ) +
        geom_text(data = bar_labs, 
                  aes(x = Var1, y = Freq, label = Freq), 
                  inherit.aes = FALSE) +
        theme(legend.position = "top",
              axis.title.y = element_blank(),
              axis.title.x = element_blank(),
              axis.ticks.y = element_blank())+
        labs(fill = "")+
        coord_flip()
} 

bar_totals(my.dt, X)

(My numbers are slightly different than yours due since we have different starting points for the random sample.)

2 Likes

Thank you for the help, aosmith.

I changed the code at bit more in an attempt to address issues I found while applying the function to some of my problematic variables.

library(tidyverse)
library(rlang)
#> 
#> Attaching package: 'rlang'
#> The following objects are masked from 'package:purrr':
#> 
#>     %@%, as_function, flatten, flatten_chr, flatten_dbl,
#>     flatten_int, flatten_lgl, flatten_raw, invoke, list_along,
#>     modify, prepend, splice
bar_totals = function(.data, value, group_var = Y) {
        value = enquo(value)
        val = as_label(value)
        bar_labs = data.frame( table(.data[[val]] ) ) %>% arrange(desc(Freq))
        print(bar_labs)
        .data %>%
                mutate(value = {{ value }} %>% fct_infreq() %>% fct_rev()) %>%  
                ggplot(aes(x = value, 
                           y = stat(count), 
                           fill= {{ group_var }})) +
                geom_bar() +
                theme_minimal()+
                geom_text(stat = "count",
                          aes(label = stat(count),
                              y = stat(count) ),
                          position = position_stack(0.5),
                          vjust =0,
                          size = 4) +
                geom_text(data = bar_labs, 
                          aes(x = Var1, y = Freq, label = Freq), 
                          inherit.aes = FALSE,
                          size = 6, 
                          vjust = 1.5, 
                          hjust = .5,
                          fontface = "bold") +
                theme(legend.position = "top",
                      axis.title.y = element_blank(),
                      axis.title.x = element_blank(),
                      axis.ticks.y = element_blank(),
                      axis.ticks.x = element_blank(),
                      axis.text.x = element_blank())+
                labs(fill = "")+
                coord_flip()
} 

When I apply the function to a complete variable, then I get plot ordered according the frequency for each category.

perfect.dt <- data.frame(
        X = sample(c("Black","Blue","Red","Green","Violet"), 1000, replace = TRUE),
        Y = sample(c("Yes", "No","Maybe"), 1000, replace = TRUE)
)
bar_totals(perfect.dt, X)
#>     Var1 Freq
#> 1   Blue  215
#> 2  Black  200
#> 3    Red  200
#> 4  Green  196
#> 5 Violet  189

However, when applied to a variable containing factor levels with 0 observations (the case for many of my variables), I get an unordered plot. Arranging the bar_labs doesn’t seem to fix the problem.


my.dt = data.frame(
        X = sample(c("Black","Blue","Red","Green","Violet"), 1000, prob=c(0.10, 0.20, 0.30, 0.39, 0.01), replace = TRUE),
        Y = sample(c("Yes", "No","Maybe", NA), 1000, prob=c(0.08, 0.50, 0.20, 0.12), replace = TRUE)
)
my.dt$X = factor(my.dt$X, c("Black","Blue","Red","Green","Violet","Pink","Yellow"))

bar_totals(my.dt, X)
#>     Var1 Freq
#> 1  Green  385
#> 2    Red  303
#> 3   Blue  198
#> 4  Black   99
#> 5 Violet   15
#> 6   Pink    0
#> 7 Yellow    0

Do you have any suggestions?

Hi PJ,

Thank you very much for your help.

Unfortunately, I don't have much coding experience...

Like aosmith's solution, yours also worked well. Except, on my problematic variables, it dropped factor levels with 0 observations. I've fixed this by adding the .drop=FALSE argument in the group_by function in your code. I've also made some appearance modifications to ensure the bar total labels don't cover the group totals when the number of observations is very small.

library(tidyverse)
library(rlang)

bar_totals =  function(.data, value, group_var = Y) {
        value <- rlang::enquo(value)
        group_var <- rlang::enquo(group_var)
        .data %>%
                mutate(value =  {{ value }} %>% fct_infreq() %>% fct_rev()) %>% # Attempt to arrange the graph according to observation frequency
                ggplot(aes(
                        x=value,
                        y = stat(count),
                        fill={{ group_var }}))+
                geom_bar()+
                theme_minimal()+      
                geom_text(stat = "count", 
                          aes(label = ..count.., 
                              y = ..count..),
                          position=position_stack(0.5),
                          vjust =0, 
                          size = 4)+
                geom_label(data = .data %>% 
                                   group_by({{ value }},.drop=FALSE) %>% # Added .drop false to ensure that all factor levels remain in plot 
                                   summarise(n()) %>%
                                   arrange(`n()`), # Another attempt to arrange plot 
                           aes(x = {{ value }}, 
                               label =  `n()`, 
                               y = `n()`), 
                           fill = "white", 
                           # Change size and position to prevent label covering group values
                           size = 4, 
                           vjust = 1.5, 
                           hjust = .5)+
                theme(legend.position="top",
                      axis.title.y = element_blank(),
                      axis.title.x = element_blank(),
                      axis.text.x = element_blank(),
                      axis.ticks.y = element_blank(),
                      axis.ticks.x = element_blank())+
                labs(fill = "")+
                coord_flip()
} 

This is what the plot looks like with an ideal data set.


perfect.dt <- data.frame(
        X = sample(c("Black","Blue","Red","Green","Violet"), 1000, replace = TRUE),
        Y = sample(c("Yes", "No","Maybe"), 1000, replace = TRUE)
)
bar_totals(perfect.dt, X)

Unfortunately, as in aosmith's solution, in cases where variable levels have 0 observations, the resulting plot is unordered.


my.dt = data.frame(
        X = sample(c("Black","Blue","Red","Green","Violet"), 1000, prob=c(0.10, 0.20, 0.30, 0.39, 0.01), replace = TRUE),
        Y = sample(c("Yes", "No","Maybe", NA), 1000, prob=c(0.08, 0.50, 0.20, 0.12), replace = TRUE)
)
my.dt$X = factor(my.dt$X, c("Black","Blue","Red","Green","Violet","Pink","Yellow"))

bar_totals(my.dt, X)

Do you have any suggestions for ensuring the plot remains ordered?

Unused factor levels automatically get dropped from discrete scales. This leads to complications in your plot since you then have one layer (overall labels) with more levels than the another (the bars), so ggplot2 re-factors everything and you lose your order.

You can change this by using drop = FALSE in scale_x_discrete() (see this question on Stack Overflow for other examples). If I add this to the end of your plotting code in your bar_totals() function

... +
scale_x_discrete(drop = FALSE)

I get a plot that I believe looks how you want it to with bar_totals(my.dt, X):

4 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.