How to create a grouped bar plot?

I need to create a plot of the total number of observations in each month of the study period. The df is
WX20200209-141353

To create a better graph, I think grouped bar plot will be better. Ideally, https://www.statmethods.net/graphs/bar.html
Thus, 2012-01 and 2013-01 can be combined a group bar.

I refer to the above website, and use the code

barplot(df.1a$n, main = "Total Number of Observations", xlab = "Each Month of the Study Period",beside=TRUE, col=c("darkblue","red"))

I think the problem is due to my df.

Also, if you have any suggestion about this graph, please tell me. Thank you so much!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

head(df.1a)

A tibble: 6 x 3

Groups: month [3]

month year n

1 1 2012 39966
2 1 2013 39637
3 2 2012 40000
4 2 2013 39344
5 3 2012 39997
6 3 2013 39009

That is not reproducible nor copy/paste friendly please read the guide more carefully and try again.

I still can't understand how to do this.

> head(df.1a)
# A tibble: 6 x 3
# Groups:   month [3]
  month year      n
  <fct> <fct> <int>
1 1     2012  39966
2 1     2013  39637
3 2     2012  40000
4 2     2013  39344
5 3     2012  39997
6 3     2013  39009

Well, you make it look as if you are not even trying, one of the most important things to learn when you are starting with R is how to properly and effectively ask for help.

This what I can do with the information you are providing (notice how I'm answering with a minimal reproducible example)

# Library calls
library(tidyverse)

# Sample data on a copy/paste friendly format
df <- data.frame(
       month = as.factor(c(1, 1, 2, 2, 3, 3)),
        year = as.factor(c(2012, 2013, 2012, 2013, 2012, 2013)),
           n = c(39966, 39637, 40000, 39344, 39997, 39009)
)

# Relevant code for the issue
df %>% 
    ggplot(aes(x = month, y = n, fill = year)) +
    geom_col()

Created on 2020-02-09 by the reprex package (v0.3.0.9001)

I really tried. Don't believe me. But I failed. I learned r for less than one week. If I want to know how to introduce reproducible data, then that will be a new post. I just want to figure this question out first. But now that you said that, I think I can figure this reproducible data problem first.

I have posted my data frame in the picture. I have three columns: month, year, n. I have 24 rows: from Jan 2012, to Dec 2012, to Jan 2013, until Dec 2014. In other words, these are 24 months in 2012 and 2013. So overall, my data frame is 24*3 matrix.

Now I open the website link you gave me. And let me follow your tutorial.

I think "iris" the name of data frame. My name of the data frame is df.1a.

head(df.1a)
# A tibble: 6 x 3
# Groups:   month [3]
  month year      n
  <fct> <fct> <int>
1 1     2012  39966
2 1     2013  39637
3 2     2012  40000
4 2     2013  39344
5 3     2012  39997
6 3     2013  39009

Your next step is "head(iris, 5)[, c('Sepal.Length', 'Sepal.Width')]". I only have 3 columns, so I skip this step.

Then you said " Now you just need to put this into a copy/paste friendly format for been posted in the forum, and you can easily do it with the datapasta package." See? you said I can put this into a copy/paste friendly format for been posted in the forum, I just follow your instruction.

But I am really confused here because you said I should use package datapasta. Okay, I download it.

> datapasta::df_paste(head(df.1a)
+ 

I don't know what I should do now. There is a + notation here.

1 Like

This is because your command is incomplete, you are missing a closing parenthesis and R is waiting for you to complete the command.

2 Likes

Nice! Thanks!

datapasta::df_paste(head(df.1a, 24))
data.frame(
           n = c(39966L,39637L,40000L,39344L,39997L,
                 39009L,40000L,38460L,40000L,38049L,40000L,37676L,40000L,
                 37269L,40000L,37021L,39999L,36758L,40000L,36515L,39897L,
                 36268L,39766L,36093L),
       month = as.factor(c("1","1","2","2","3",
                           "3","4","4","5","5","6","6","7","7","8",
                           "8","9","9","10","10","11","11","12","12")),
        year = as.factor(c("2012","2013","2012",
                           "2013","2012","2013","2012","2013","2012",
                           "2013","2012","2013","2012","2013","2012","2013",
                           "2012","2013","2012","2013","2012","2013","2012",
                           "2013"))
)

Ok, if I apply the code I already gave you to your new sample data, this is what I get, is this not what you are looking for?

# Library calls
library(tidyverse)

# Sample data on a copy/paste friendly format
df <- data.frame(
    n = c(39966L,39637L,40000L,39344L,39997L,
          39009L,40000L,38460L,40000L,38049L,40000L,37676L,40000L,
          37269L,40000L,37021L,39999L,36758L,40000L,36515L,39897L,
          36268L,39766L,36093L),
    month = as.factor(c("1","1","2","2","3",
                        "3","4","4","5","5","6","6","7","7","8",
                        "8","9","9","10","10","11","11","12","12")),
    year = as.factor(c("2012","2013","2012",
                       "2013","2012","2013","2012","2013","2012",
                       "2013","2012","2013","2012","2013","2012","2013",
                       "2012","2013","2012","2013","2012","2013","2012",
                       "2013"))
)

# Relevant code for the issue
df %>% 
    ggplot(aes(x = factor(month, levels = 1:12), y = n, fill = year)) +
    geom_col()

Unfortunately not. I want a figure like this:

So 2012-01 and 2013-01 can be combined. Then 2012-02 and 2013-02 be combined; ...

If you pay more attention you would notice that in practice it is the same thing, the only difference is the position of the bars, which can be fixed pretty easily.

# Library calls
library(tidyverse)

# Sample data on a copy/paste friendly format
df <- data.frame(
    n = c(39966L,39637L,40000L,39344L,39997L,
          39009L,40000L,38460L,40000L,38049L,40000L,37676L,40000L,
          37269L,40000L,37021L,39999L,36758L,40000L,36515L,39897L,
          36268L,39766L,36093L),
    month = as.factor(c("1","1","2","2","3",
                        "3","4","4","5","5","6","6","7","7","8",
                        "8","9","9","10","10","11","11","12","12")),
    year = as.factor(c("2012","2013","2012",
                       "2013","2012","2013","2012","2013","2012",
                       "2013","2012","2013","2012","2013","2012","2013",
                       "2012","2013","2012","2013","2012","2013","2012",
                       "2013"))
)

# Relevant code for the issue
df %>% 
    ggplot(aes(x = factor(month, levels = 1:12), y = n, fill = year)) +
    geom_col(position = "dodge")

1 Like

Thank you so much!!! I will study your code.

Once you post a reprex, lots of people can help.

# Library calls
library(tidyverse)

# Sample data on a copy/paste friendly format
df <- data.frame(
  n = c(39966L,39637L,40000L,39344L,39997L,
        39009L,40000L,38460L,40000L,38049L,40000L,37676L,40000L,
        37269L,40000L,37021L,39999L,36758L,40000L,36515L,39897L,
        36268L,39766L,36093L),
  month = factor(c("1","1","2","2","3",
                      "3","4","4","5","5","6","6","7","7","8",
                      "8","9","9","10","10","11","11","12","12"), 
                 levels = 1:12,
                  ordered = TRUE),
  year = as.factor(c("2012","2013","2012",
                     "2013","2012","2013","2012","2013","2012",
                     "2013","2012","2013","2012","2013","2012","2013",
                     "2012","2013","2012","2013","2012","2013","2012",
                     "2013")))

df %>% 
  group_by(month) %>% 
  ggplot(aes(x = month, y =n, fill = year)) +
  geom_col(position = 'dodge')

Created on 2020-02-09 by the reprex package (v0.3.0)

1 Like

Hello, I copy and paste your code in my computer, but there is a problem.

> df %>% 
+   group_by(month) %>% 
+   ggplot(aes(x = month, y =n, fill = year)) +
+   geom_col(position = 'dodge')
Error: Aesthetics must be valid data columns. Problematic aesthetic(s): y = n. 
Did you mistype the name of a data column or forget to add stat()?

Hello, thank you for your help. But I face a problem running your code. Please see my above post.

This error message suggests that the actual column name is different in your real data frame, this usually happens when you use non-sintactic variable names.

1 Like

You are right. The problem happens because the name of my data frame isn't df. Thanks!

6 posts were split to a new topic: grouped bar plot

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.