Help with bar graph in ggplot

Can you show a drawing like you did before regarding what do you want to get ?

The main thing was to have 2 axis, percent and frequency on primary and secondary axis. But because it seems not possible I am just playing with my graph. But here is a drawing where I had everything in percentage.

The x and y refer to the legend position on the graph. Just play around with the numbers.

For example:

ggplot(dat, aes(variable, value, fill=interaction(modality))) +
  geom_bar(stat='identity', position='dodge') +
  theme_bw() + 
  scale_fill_brewer('Variables', palette='Spectral') + geom_text(aes(label=value), position=position_dodge(width=0.9), vjust=-0.25) +
  theme(legend.title=element_blank()) +
  labs(x = NULL, y = "Frequency") +
  theme(legend.position = c(.3, .95),legend.direction = "horizontal")

1 Like

Hi @andresrcs, I have read your solution in this post:
https://forum.posit.co/t/adding-percentages-to-a-bargraph-in-ggplot/35860

but I am still confsed how to convert counts to percentages in this plot.
Can you advise, please ?

Thanks for the extended effort. I shall go through it later today.
Otherwise, If you divide column response by frequency column, you should be able to get %. Like, first response has 201 count for appearance and if you divide it by frequency column 749, its percent (27%). Similar to other columns.
Thanks a lot!
Annotation 2020-04-09 085700

1 Like

Making the conversion on the data is trivial, the challenge would be to get a primary y-axis with frequency and a secondary y-axis with percentages, I don't know how to do it or if it's possible with ggplot2, I think you can do it with base R and lattice but I don't remember how.

library(tidyverse)

df <- data.frame(
   First_response = c(201L, 8L, 107L, 151L, 282L),
  Second_response = c(72L, 17L, 148L, 225L, 260L),
   Third_response = c(54L, 17L, 177L, 220L, 360L),
  Fourth_response = c(46L, 24L, 168L, 198L, 356L),
   Fifth_response = c(39L, 13L, 122L, 150L, 402L),
        Frequency = c(749L, 722L, 828L, 792L, 726L),
         modality = as.factor(c("Appearance",
                                "Aroma","Flavor","Texture","Hedonic"))
)

df %>% 
    mutate_at(vars(-modality, - Frequency), ~ . / Frequency) %>% 
    gather(Response, Percentage, First_response:Fifth_response) %>%
    mutate(Response = factor(Response,
                      levels = c("First_response", "Second_response",
                                 "Third_response", "Fourth_response",
                                 "Fifth_response"))) %>% 
    ggplot(aes(x = Response, y = Percentage, fill = modality)) +
    geom_col(position = "dodge") + 
    geom_text(aes(label = scales::percent(Percentage,
                                          accuracy = 0.1)),
              position = position_dodge(width=0.9),
              vjust = -0.25) +
    labs(x = NULL, y = "Percentage") +
    scale_y_continuous(labels = scales::label_percent()) +
    scale_fill_brewer('Variables', palette='Spectral') +
    theme_bw() + 
    theme(legend.title = element_blank(),
          legend.position = c(.3, .95),
          legend.direction = "horizontal")

1 Like

Thank you very much indeed @andresrcs for your kind reply and help.
I learned a lot and now I will try to figure something out to have both y-axises.

2 Likes

With data.table

library(data.table)
library(ggplot2)
dt<- structure(list(modality = structure(c(1L, 2L, 3L, 5L, 4L), .Label = c("Appearance", 
                              "Aroma", "Flavor", "Hedonic", "Texture"), class = "factor"), 
                     First_response = c(201L, 8L, 107L, 151L, 282L), 
                     Second_response = c(72L,  17L, 148L, 225L, 260L),
                     Third_response = c(54L, 17L, 177L,  220L, 360L),
                     Fourth_response = c(46L, 24L, 168L, 198L, 356L ),
                     Fifth_response = c(39L, 13L, 122L, 150L, 402L), 
                     Frequency = c(749L, 722L, 828L, 792L, 726L)), class = "data.frame", 
                row.names = c(NA,-5L))
dt1<-data.table(dt)[,.SD/Frequency,by=c("Frequency","modality"),.SDcols=2:6]
dat <- melt(dt1,id.vars = c("Frequency","modality"),variable.name = "Response", value.name = "Percentage",variable.factor=TRUE)
dat[,ggplot(.SD,aes(x = Response, y = Percentage, fill = modality)) +
  geom_col(position = "dodge") + 
  geom_text(aes(label = scales::percent(Percentage, accuracy = 0.1)),
            position = position_dodge(width=0.9), vjust = -0.25) +
  labs(x = NULL, y = "Percentage") +
  scale_y_continuous(labels = scales::label_percent()) +
  scale_fill_brewer('Variables', palette='Spectral') +
  theme_bw() + 
  theme(legend.title = element_blank(),
        legend.position = c(.3, .95),
        legend.direction = "horizontal"),]

1 Like

Thanks @Hermes I tried to run the code you pasted above, but its showing an error "Error: 'label_percent' is not an exported object from 'namespace:scales'.

dt<- structure(list(modality = structure(c(1L, 2L, 3L, 5L, 4L), .Label = c("Appearance", 
                                                                           "Aroma", "Flavor", "Hedonic", "Texture"), class = "factor"), 
                    First_response = c(201L, 8L, 107L, 151L, 282L), 
                    Second_response = c(72L,  17L, 148L, 225L, 260L),
                    Third_response = c(54L, 17L, 177L,  220L, 360L),
                    Fourth_response = c(46L, 24L, 168L, 198L, 356L ),
                    Fifth_response = c(39L, 13L, 122L, 150L, 402L), 
                    Frequency = c(749L, 722L, 828L, 792L, 726L)), class = "data.frame", 
               row.names = c(NA,-5L))
dt1<-data.table(dt)[,.SD/Frequency,by=c("Frequency","modality"),.SDcols=2:6]
#> Error in data.table(dt): could not find function "data.table"
dat <- melt(dt1,id.vars = c("Frequency","modality"),variable.name = "Response", value.name = "Percentage",variable.factor=TRUE)
#> Error in melt(dt1, id.vars = c("Frequency", "modality"), variable.name = "Response", : could not find function "melt"
dat[,ggplot(.SD,aes(x = Response, y = Percentage, fill = modality)) +
      geom_col(position = "dodge") + 
      geom_text(aes(label = scales::percent(Percentage, accuracy = 0.1)),
                position = position_dodge(width=0.9), vjust = -0.25) +
      labs(x = NULL, y = "Percentage") +
      scale_y_continuous(labels = scales::label_percent()) +
      scale_fill_brewer('Variables', palette='Spectral') +
      theme_bw() + 
      theme(legend.title = element_blank(),
            legend.position = c(.3, .95),
            legend.direction = "horizontal"),]
#> Error in eval(expr, envir, enclos): object 'dat' not found

Created on 2020-04-11 by the reprex package (v0.3.0)

library(data.table)

It's something else. I already have function "data.table". Here you go,

library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 3.5.3
library(reprex)
#> Warning: package 'reprex' was built under R version 3.5.3
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.5.3
#> Warning: package 'tibble' was built under R version 3.5.3
#> Warning: package 'tidyr' was built under R version 3.5.3
#> Warning: package 'purrr' was built under R version 3.5.3
#> Warning: package 'dplyr' was built under R version 3.5.3
#> Warning: package 'stringr' was built under R version 3.5.3
library(reshape2)
#> Warning: package 'reshape2' was built under R version 3.5.3
#> 
#> Attaching package: 'reshape2'
#> The following object is masked from 'package:tidyr':
#> 
#>     smiths
library(data.table)
#> 
#> Attaching package: 'data.table'
#> The following objects are masked from 'package:reshape2':
#> 
#>     dcast, melt
#> The following objects are masked from 'package:dplyr':
#> 
#>     between, first, last
#> The following object is masked from 'package:purrr':
#> 
#>     transpose
#Help from R-community

dt<- structure(list(modality = structure(c(1L, 2L, 3L, 5L, 4L), .Label = c("Appearance", 
                                                                           "Aroma", "Flavor", "Hedonic", "Texture"), class = "factor"), 
                    First_response = c(201L, 8L, 107L, 151L, 282L), 
                    Second_response = c(72L,  17L, 148L, 225L, 260L),
                    Third_response = c(54L, 17L, 177L,  220L, 360L),
                    Fourth_response = c(46L, 24L, 168L, 198L, 356L ),
                    Fifth_response = c(39L, 13L, 122L, 150L, 402L), 
                    Frequency = c(749L, 722L, 828L, 792L, 726L)), class = "data.frame", 
               row.names = c(NA,-5L))
dt1<-data.table(dt)[,.SD/Frequency,by=c("Frequency","modality"),.SDcols=2:6]
dat <- melt(dt1,id.vars = c("Frequency","modality"),variable.name = "Response", value.name = "Percentage",variable.factor=TRUE)
dat[,ggplot(.SD,aes(x = Response, y = Percentage, fill = modality)) +
      geom_col(position = "dodge") + 
      geom_text(aes(label = scales::percent(Percentage, accuracy = 0.1)),
                position = position_dodge(width=0.9), vjust = -0.25) +
      labs(x = NULL, y = "Percentage") +
      scale_y_continuous(labels = scales::label_percent()) +
      scale_fill_brewer('Variables', palette='Spectral') +
      theme_bw() + 
      theme(legend.title = element_blank(),
            legend.position = c(.3, .95),
            legend.direction = "horizontal"),]
#> Error: 'label_percent' is not an exported object from 'namespace:scales'

Created on 2020-04-11 by the reprex package (v0.3.0)

Update scales package, that function is rather new

2 Likes

Thanks, I got it. But it still missing both y-axis.

With these versions of data.table and ggplot2, it runs correctly:

>lapply(c('ggplot2','data.table'),packageVersion)
[[1]]
[1] ‘3.3.0’

[[2]]
[1] ‘1.12.9’
print(version)
               _                           
platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          3                           
minor          6.3                         
year           2020                        
month          02                          
day            29                          
svn rev        77875                       
language       R                           
version.string R version 3.6.3 (2020-02-29)
nickname       Holding the Windsock        

1 Like

If I understand you correctly sharmachetan is it what you want - meaning two y- axises ?

1 Like

Yes, it is. I want both percentage and frequency. The plot you attached looks good, I think.

But unfortunately, I think that, this is virtually impossible to do.
Maybe with some rearrangement of data layout somehow ?

1 Like

Okay, but how you produced the above graph, it had both frequency and %.

It was done in Windows Paint because I just wanted to make sure that I understood you correctly.

I used Andrzej post as a starting point.
Here is second axis with percentage values, this only would make sense if the percentage is of the total frequency of the entire data (i.e. ignores groupings), so this is what is shown

df <- structure(list(
  modality = structure(c(1L, 2L, 3L, 5L, 4L), .Label = c(
    "Appearance",
    "Aroma", "Flavor", "Hedonic", "Texture"
  ), class = "factor"),
  First_response = c(201L, 8L, 107L, 151L, 282L), Second_response = c(
    72L,
    17L, 148L, 225L, 260L
  ), Third_response = c(
    54L, 17L, 177L,
    220L, 360L
  ), Fourth_response = c(46L, 24L, 168L, 198L, 356L), Fifth_response = c(39L, 13L, 122L, 150L, 402L), Frequency = c(
    749L,
    722L, 828L, 792L, 726L
  )
), class = "data.frame", row.names = c(
  NA,
  -5L
))

library(reshape2)
library(ggplot2)
library(tidyverse)
library(scales)

df2 <- select(df, -Frequency)

dat <- melt(df2)

# to linear scale frequency values into % of global total
data_total <- sum(dat$value)

# > Using modality as id variables
ggplot(dat, aes(modality, value, fill = interaction(variable))) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  scale_fill_brewer("Variables", palette = "Spectral") +
  scale_y_continuous(sec.axis = sec_axis(~ . / data_total, labels = percent))

1 Like