grouped barplot with error bars

Hi,
I'm new to R and I'm trying to plot a grouped bar plot with se bars, but so far no success. I'd appreciate any words of wisdom. Thanks!

Dataset:
date year month site sample chla
2013-07-18 2013 July A1 1 0.001082
2013-08-14 2013 August A1 2 0.010676
2013-09-19 2013 September A1 3 0.00651
2013-07-18 2013 July A2 1 0.000772
2013-08-14 2013 August A2 2 0.002106
2013-09-18 2013 September A2 3 0.009325
2013-07-18 2013 July A3 1 0.000227
2013-08-13 2013 August A3 2 0.011545
2013-09-18 2013 September A3 3 0.015313
2013-07-19 2013 July A4 1 0.000297
2013-08-13 2013 August A4 2 0.014848
2013-09-18 2013 September A4 3 0.028509

Script:

#grouped bar plot with se bars (by site)

library(ggplot2)

ggplot(StandardMethodChla,aes(fill=month,y=chla,x=site))+
  geom_bar(position="dodge",stat="identity")

se=sd(chla)/sqrt(length(chla))

ggplot(StandardMethodChla)+
  geom_bar(aes(x=site,y=chla),stat="identity",fill=month,alpha=0.5)+
  geom_errorbar(aes(x=site,ymin=mean-se,ymax=mean+se),width=0.4,
                colour="orange",alpha=0.9,size=1.5)+
  ggtitle("using standard error") 
1 Like

Hi Christiane, Welcome

I think this is close to what you are trying to do, it looks weird because you don't have enough observations by site/month to calculate se, but you can use this as a starting point.

Also, notice the way I'm posting the data, this is the correct way of sharing sample data, you can use datapasta package to do the same.

library(tidyverse, quietly = TRUE)
StandardMethodChla <- data.frame(stringsAsFactors=FALSE,
                                 date = as.Date(c("2013-07-18", "2013-08-14", "2013-09-19", "2013-07-18",
                                                  "2013-08-14", "2013-09-18", "2013-07-18", "2013-08-13",
                                                  "2013-09-18", "2013-07-19", "2013-08-13", "2013-09-18")),
                                 year = c(2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013,
                                          2013, 2013),
                                 month = c("July", "August", "September", "July", "August", "September",
                                           "July", "August", "September", "July", "August", "September"),
                                 site = c("A1", "A1", "A1", "A2", "A2", "A2", "A3", "A3", "A3", "A4",
                                          "A4", "A4"),
                                 sample = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3),
                                 chla = c(0.001082, 0.010676, 0.00651, 0.000772, 0.002106, 0.009325,
                                          0.000227, 0.011545, 0.015313, 0.000297, 0.014848, 0.028509)
)

StandardMethodChla %>% 
    group_by(site) %>% 
    mutate(se = sd(chla)/sqrt(length(chla))) %>% 
    ggplot(aes(x = site, y = chla, fill = month)) + 
    geom_bar(stat="identity", alpha=0.5, 
             position=position_dodge()) +
    geom_errorbar(aes(ymin=chla-se, ymax=chla+se), width=.2, colour="orange", 
                  position=position_dodge(.9))

Created on 2019-01-25 by the reprex package (v0.2.1)

2 Likes

Thank you, Andres!
Would be possible to reorder the x-axis? And the months as well?

Yes, you can transform it into factors and set order with levels factor(month, levels = c('July', 'August', 'September')

1 Like

Just a few suggestions, note the use of lubridate, which amongst other things fixes the month ordering and also the n() function. However, I have to add, that visualising distributions using a barplot and error bars is not a good idea

# Load libraries
library('tidyverse')
library('lubridate')

# Create data
d = tibble(date = ymd(c("2013-07-18", "2013-08-14", "2013-09-19", "2013-07-18",
                        "2013-08-14", "2013-09-18", "2013-07-18", "2013-08-13",
                        "2013-09-18", "2013-07-19", "2013-08-13", "2013-09-18")),
           year = year(date),
           month = month(date, label = TRUE),
           site = c("A1", "A1", "A1", "A2", "A2", "A2", "A3", "A3", "A3", "A4",
                    "A4", "A4"),
           sample = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3),
           chla = c(0.001082, 0.010676, 0.00651, 0.000772, 0.002106, 0.009325,
                    0.000227, 0.011545, 0.015313, 0.000297, 0.014848, 0.028509)
) %>% group_by(site) %>% mutate(sem = sd(chla) / sqrt(n())) %>% ungroup

# Plot
d %>% 
  ggplot(aes(x = site, y = chla, fill = month)) +
  geom_col(alpha = 0.5, position = 'dodge') +
  geom_errorbar(aes(ymin = chla - sem, ymax = chla + sem),
                width = 0.2, colour = "orange", 
                position = position_dodge(.9)) +
  theme_bw()

4 Likes

Thank you very much, Andres and Leon. I really appreciate your help. This is really awesome. But, I decided to use box plots instead given the small sample size. And compare parametric vs. non-parametric analyses.

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.