# Creating stacked bar charts with percentages in R

Hello everyone, new to R here..
I'm on a project and want to do some stacked bar charts i R.
My data is in the form of a CSV with [column 1= Country, column2= Variable 1(%, column3= Variable 2(%)] count total is 60..

What packages do i need?
How to make the scale on y=axis?
How to create the stacked bars?

If anyone have a code that might help, it will be fantastic. I've tried Youtube-University but having a lot of errors.

Below is a repex

Sorry but that is not a repex. That is a screenshot.

here is how to create a reproducible example (reprex)

You do not say what package you are using to create the barplot but here is a link to a post that discusses how to approach the problem using ggplot2 Barplot with percentages.

I wonder if a stacked barplot is the best choice here with 60 elements in a bar? I think it will be close to uninterpretable. If you explain what you are doing in substantive terms someone here may be able to suggest some alternatives.

I'm doing a study on inclusion of local researchers during research in African countries.
I want a graphical representation of the percentages "included and excluded" .
Any graphical representation other than maps will help.. I've used so many maps instead, want to use another chart

Okay, this makes sense. I, personally detest stacked barcharts so I may not be the best source of ideas but if you can give us some sample data we may be able to come up with some possible alternatives. Something like a Cleveland dot plot comes to mind.

A handy way to supply sample data is to use the dput() function. See ?dput. If you have a very large data set then something like head(dput(myfile), 100) will likely supply enough data for us to work with.

structure(list(Country = c("Algeria", "Angola", "Benin", "Botswana",
"Burkina Faso", "Burundi", "Cameroon", "Cape Verde", "Central African Republic",
"Chad"), Authorship = c(273, 1, 34, 70, 35, 7, 88, 11, 0, 6),
UnAuthorship = c(49, 16, 29, 75, 18, 2, 31, 1, 4, 35), PA = c(84.7826087,
5.882352941, 53.96825397, 48.27586207, 66.03773585, 77.77777778,
73.94957983, 91.66666667, 0, 14.63414634), PUA = c(15.2173913,
94.11764706, 46.03174603, 51.72413793, 33.96226415, 22.22222222,
26.05042017, 8.333333333, 100, 85.36585366)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))

Above is the sample data

Is this what you are after?

``````# Provided sample data
df <- structure(list(Country = c("Algeria", "Angola", "Benin", "Botswana",
"Burkina Faso", "Burundi", "Cameroon", "Cape Verde", "Central African Republic",
"Chad"), Authorship = c(273, 1, 34, 70, 35, 7, 88, 11, 0, 6),
UnAuthorship = c(49, 16, 29, 75, 18, 2, 31, 1, 4, 35), PA = c(84.7826087,
5.882352941, 53.96825397, 48.27586207, 66.03773585, 77.77777778,
73.94957983, 91.66666667, 0, 14.63414634), PUA = c(15.2173913,
94.11764706, 46.03174603, 51.72413793, 33.96226415, 22.22222222,
26.05042017, 8.333333333, 100, 85.36585366)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))

# Relevant code
library(tidyverse)

df %>%
pivot_longer(c("PA", "PUA"), names_to = "type") %>%
ggplot(aes(y = Country, x = value, fill = type)) +
geom_col() +
scale_x_continuous(labels = scales::percent_format(scale = 1))
``````

Created on 2021-12-05 by the reprex package (v2.0.1)

Wow, thanks very much. It was initially until someone told me 60 columns of data on a bar chart will be less interpretable. Can you help with a cleavland plot ?
Thanks

I don't see a substantial difference between the two in regards to overplotting considerations. Anyways, we like to keep things tidy around here so if you have different questions (i.e. cleveland plots instead of stacked bar charts) please ask it on a new topic providing a relevant REPRoducible EXample (reprex) illustrating your issue.

I was thinking thinking of the old Cleveland plots from Bill Cleveland"s Elements of Graphing Data that hopefully would provide a line and dot for each value of PA & PUA, not the current ggplot implementation. I agree that otherwise there is unlikely to be any substantial difference between the two in regards to overplotting considerations

In fact I was wondering if it might make sense to split the two variables into two plots depending on what @ Augustine94 needs.

Unfortunately I won't be able to work on it until tomorrow.

I agree

You do not need to answer the question in my new thread, but I figured I would still alert you to it since I mention your username in it. (I am new to the RStudio community and am unsure whether it is impolite to discuss someone else's code without inviting them to the conversation.)

@andresrcs thanks for the reply. And thanks for the help.
I am a new R user and a bit curious. I'll sure keep things tidy here

@jrkrideau Thanks for the support and help.
If I can get a sample a code, that will be great. And as always, at your own convenient time.

This is not what I had in mind originally but does in look at all useful?

``````library(tidyverse)

dat1 <- structure(list(Country = c("Algeria", "Angola", "Benin", "Botswana",
"Burkina Faso", "Burundi", "Cameroon", "Cape Verde", "Central African Republic",
"Chad"), Authorship = c(273, 1, 34, 70, 35, 7, 88, 11, 0, 6),
UnAuthorship = c(49, 16, 29, 75, 18, 2, 31, 1, 4, 35), PA = c(84.7826087,
5.882352941, 53.96825397, 48.27586207, 66.03773585, 77.77777778,
73.94957983, 91.66666667, 0, 14.63414634), PUA = c(15.2173913,
94.11764706, 46.03174603, 51.72413793, 33.96226415, 22.22222222,
26.05042017, 8.333333333, 100, 85.36585366)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))

dat2 <-   dat1 %>%
pivot_longer(c("PA", "PUA"), names_to = "type")

ggplot(data = dat2, aes(x = value, y = Country, colour = type)) + geom_point() +
theme(legend.position = "none") +
facet_grid(. ~ type)

``````

@jrkrideau Thanks very much. I'll work with this

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.