Ggplot problems

Hi,

I have a problem with some R Studie code. I am totally new! It is the first time I am trying to make a GGPLOT. I have the following code (the values are fictive in this dataset):

library(tidyr)
library(ggplot2)
library(dplyr)


#plot with CI 
r_data_2 <- data.frame(time = c("January", "February", "March", "April", "May", "June"),
                       Cake = c(12,26,39,37,40,28),
                       NoCake = c(1,2,3,1,5,4))

r_datatall_2 <- r_data_2 %>% gather(key=test, value, Cake:NoCake)
r_datatall_2
r_datatall_2$n <- c(13,28,42,38,45,32,13,28,42,38,45,32)

#percent
r_datatall_2$prop <- r_datatall_2$value/r_datatall_2$n*100

#calculateCI intervals 
r_datatall_2$ci_low <- ((r_datatall_2$value/r_datatall_2$n)-1.96*sqrt(((r_datatall_2$value/r_datatall_2$n)*(1-(r_datatall_2$value/r_datatall_2$n)))/r_datatall_2$n))*100
r_datatall_2$ci_hi <- ((r_datatall_2$value/r_datatall_2$n)+1.96*sqrt(((r_datatall_2$value/r_datatall_2$n)*(1-(r_datatall_2$value/r_datatall_2$n)))/r_datatall_2$n))*100
r_datatall_2$ci <- r_datatall_2$prop-r_datatall_2$ci_low

#plot bars with CI-intervals
ggplot(r_datatall_2, aes(time, prop, fill = test)) + 
  geom_bar(position ='dodge', stat='identity') +
  geom_text(aes(label=value), position=position_dodge(width=0.9), vjust=-5.5)  +
  labs(y="Percent (%)", x="", fill="", legend=c("A","B")) +
  ylim(0,120)+
  scale_fill_brewer(palette="Paired", labels = c("Cake","No Cake"))+
  ggtitle("Cake according to month")+
  geom_errorbar(aes(ymin=ci_low, ymax=ci_hi, x=time), position=position_dodge(0.9), width = 0.3)

I have several issues:

  1. The months are not in proper order?
  2. The errorbars are not complete (in the bottom) and often 'hits' the value above?
  3. Is it possible to place e.g. a line plot of a median across the months? So both the line plot (in this case one line) and the bar plots are displayed?

Thanks a lot!

I changed just a few things. I set the levels of time so the months would be in calendar order, not alphabetical order. I removed the call to ylim() to prevent the error bars from being truncated. Of course, negative values do not really make sense for a proportion. I put a line at the median value of each test by using geom_hline.

library(tidyr)
library(ggplot2)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union


#plot with CI 
r_data_2 <- data.frame(time = c("January", "February", "March", "April", "May", "June"),
                       Cake = c(12,26,39,37,40,28),
                       NoCake = c(1,2,3,1,5,4))

r_datatall_2 <- r_data_2 %>% gather(key=test, value, Cake:NoCake)
r_datatall_2
#>        time   test value
#> 1   January   Cake    12
#> 2  February   Cake    26
#> 3     March   Cake    39
#> 4     April   Cake    37
#> 5       May   Cake    40
#> 6      June   Cake    28
#> 7   January NoCake     1
#> 8  February NoCake     2
#> 9     March NoCake     3
#> 10    April NoCake     1
#> 11      May NoCake     5
#> 12     June NoCake     4
r_datatall_2$n <- c(13,28,42,38,45,32,13,28,42,38,45,32)

#percent
r_datatall_2$prop <- r_datatall_2$value/r_datatall_2$n*100

#calculateCI intervals 
r_datatall_2$ci_low <- ((r_datatall_2$value/r_datatall_2$n)-1.96*sqrt(((r_datatall_2$value/r_datatall_2$n)*(1-(r_datatall_2$value/r_datatall_2$n)))/r_datatall_2$n))*100
r_datatall_2$ci_hi <- ((r_datatall_2$value/r_datatall_2$n)+1.96*sqrt(((r_datatall_2$value/r_datatall_2$n)*(1-(r_datatall_2$value/r_datatall_2$n)))/r_datatall_2$n))*100
r_datatall_2$ci <- r_datatall_2$prop-r_datatall_2$ci_low
MEDIAN <- r_datatall_2 %>% group_by(test) %>% summarize(MED = median(prop))
#> `summarise()` ungrouping output (override with `.groups` argument)
r_datatall_2$time <- factor(r_datatall_2$time, levels = c("January", "February", 
                                                          "March", "April", "May", "June"))
#plot bars with CI-intervals
ggplot(r_datatall_2, aes(time, prop, fill = test)) + 
  geom_bar(position ='dodge', stat='identity') +
  geom_text(aes(label=value), position=position_dodge(width=0.9), vjust=-5.5)  +
  geom_hline(mapping = aes(yintercept = MED), data = MEDIAN, linetype = 2) + 
  labs(y="Percent (%)", x="", fill="", legend=c("A","B")) +
 # ylim(0,120)+
  scale_fill_brewer(palette="Paired", labels = c("Cake","No Cake"))+
  ggtitle("Cake according to month")+
  geom_errorbar(aes(ymin=ci_low, ymax=ci_hi, x=time), position=position_dodge(0.9), width = 0.3)

Created on 2021-02-08 by the reprex package (v0.3.0)

Thank you for your quick reply, @FJCC! :blush:

What if I want to remove the errorbars? As you say, negative values does not make sense. I cant just remove the:

geom_errorbar(aes(ymin=ci_low, ymax=ci_hi, x=time), position=position_dodge(0.9), width = 0.3)

Then I cant run the program. Im sorry, I was not clear on the median part. If I have a median (with interguartile ranges) of each month (for another variable than 'cake') lets say:

Month: median (IQR):
January: 5.5 (2)
February: 4 (3)
March 6 (4)
April: 8 (1)
May: 5 (5)
June 8 (4)

Can these medians be displayed as an overlaying line plot (with a separate Y-axis to the right, as this median is a frequency and not a percent)? With the IQR as errorbar on each median.

Thanks again - this is really helpful!

Secondary axes in ggplot have to be a transformation of the primary axis. Since your primary y axis runs from 0 to 100, I had the secondary axis cover one tenth of that. The following code results in a plot that needs to be cleaned up, but I will leave that to you.

library(tidyr)
library(ggplot2)
library(dplyr)


#plot with CI 
r_data_2 <- data.frame(time = c("January", "February", "March", "April", "May", "June"),
                       Cake = c(12,26,39,37,40,28),
                       NoCake = c(1,2,3,1,5,4))
MEDIAN <- data.frame(time = c("January", "February", "March", "April", "May", "June"),
                     Median = c(5.5,4,6,8,5,8),
                     IQR = c(2,3,4,1,5,4))
MEDIAN <- MEDIAN %>% mutate(time = factor(time, levels = c("January", "February", 
                                     "March", "April", "May", "June")),
                            IQRhigh = Median + IQR/2,
                            IQRlow = Median - IQR/2)

r_datatall_2 <- r_data_2 %>% gather(key=test, value, Cake:NoCake)
r_datatall_2
r_datatall_2$n <- c(13,28,42,38,45,32,13,28,42,38,45,32)

#percent
r_datatall_2$prop <- r_datatall_2$value/r_datatall_2$n*100

#calculateCI intervals 
r_datatall_2$ci_low <- ((r_datatall_2$value/r_datatall_2$n)-1.96*sqrt(((r_datatall_2$value/r_datatall_2$n)*(1-(r_datatall_2$value/r_datatall_2$n)))/r_datatall_2$n))*100
r_datatall_2$ci_hi <- ((r_datatall_2$value/r_datatall_2$n)+1.96*sqrt(((r_datatall_2$value/r_datatall_2$n)*(1-(r_datatall_2$value/r_datatall_2$n)))/r_datatall_2$n))*100
r_datatall_2$ci <- r_datatall_2$prop-r_datatall_2$ci_low
#MEDIAN <- r_datatall_2 %>% group_by(test) %>% summarize(MED = median(prop))
r_datatall_2$time <- factor(r_datatall_2$time, levels = c("January", "February", 
                                                          "March", "April", "May", "June"))
#plot bars with CI-intervals
ggplot(r_datatall_2, aes(time)) + 
  geom_bar(aes(y = prop, fill = test), position ='dodge', stat='identity') +
  geom_text(aes(y = prop, label=value), position=position_dodge(width=0.9), vjust=-5.5)  +
  geom_line(aes(x = time, y = Median * 10, group = 1), data = MEDIAN) + 
  geom_errorbar(aes(x = time, ymin = IQRlow * 10, ymax = IQRhigh * 10), data = MEDIAN, width = 0.3) +
  labs(y="Percent (%)", x="", fill="", legend=c("A","B")) +
 # ylim(0,120)+
  scale_fill_brewer(palette="Paired", labels = c("Cake","No Cake"))+
  scale_y_continuous(sec.axis = sec_axis(~ . / 10, name = "Median"))
  ggtitle("Cake according to month") #+
  #geom_errorbar(aes(ymin=ci_low, ymax=ci_hi, x=time), position=position_dodge(0.9), width = 0.3)

Thanks once again @FJCC!

I have changed quite a bit to:

#install.packages("readxl")
library(tidyr)
library(ggplot2)
library(dplyr)


#Figure
r_data_2 <- data.frame(time = c("January", "February", "March", "April", "May", "June"),
                       Cake = c(12,26,39,37,40,28),
                       NoCake = c(1,2,3,1,5,4))


ggplot(r_data_2, aes(fill=))

r_datatall_2 <- r_data_2 %>% gather(key=test, value, NoCake:Cake)
r_datatall_2
r_datatall_2$n <- c(13,28,42,38,45,32,13,28,42,38,45,32)
r_datatall_2$median = c(6,5,5,7,7,5.5)


#month sequence 
r_datatall_2$time <- factor(r_datatall_2$time, levels = c("January", "February","March", "April", "May", "June"))
#cake/nocake sequence
r_datatall_2$test <- factor(r_datatall_2$test, levels = c("NoCake", "Cake"))

#Figure
ggplot(r_datatall_2,aes(fill=test,y=value,x=time))  +
  geom_bar(position="fill",stat="identity") +
  scale_fill_brewer(palette="Paired") +
  labs(y="Percent (%)", x="", fill="") +
  geom_line(aes(x = time, y = median * 0.10), group = 1,size=1.5,color="dark green")  +
  scale_y_continuous(sec.axis = sec_axis(~ . / 0.1, name = "Median open shops"))

Do you now if it is possible to:

  1. Make a new column for each month with Candy and No Candy similar with Cake and No Cake (so they are placed next to each other)?
  2. How do I change percent to 0-100 instead of 0-1 in the left Y-axis?
  3. How do I make a text box for the median (so there is a green square next to the two others saying "median")?
  4. How can I label Cake and NoCake so there is a separation in "No Cake" (instead of NoCake)?
  5. Is it possible to have errorbars with IQR for the green median line?

That was a lot! Sorry! :blush:

  1. I do not understand the difficulty with this. Your code already has two methods of adding data to a data frame, using the data.frame() function and using an assignment like r_datatall_2$median = c(6,5,5,7,7,5.5). Please explain more about this.
  2. The y axis runs from 0 to 1 because you set position = "fill". You can change the labels on the y axis by using the labels and breaks arguments in scale_y_continuous(). The breaks have to be in the range 0 to 1 but the labels can be any text.
  3. I do not know how to do this.
  4. You can either replace NoCake with No Cake in the data frame after you gather it or you can use the labels argument in the factor() function when you change test into a factor. In factor(), you can set the levels to NoCake and Cake but set the labels to No Cake and Cake.
  5. The last code I posted had error bars on the median line.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.