ggplot2 - problem with code

`Preformatted text

install.packages("gridExtra")
install.packages("grid")
install.packages("extrafont")
install.packages("showtext", dependencies = TRUE)
install.packages("RColorBrewer")
install.packages("scales")

#packages for graphs
library(ggplot2)
library(tidyverse)
library(grid)
library(gridExtra)
library(showtext)
library(extrafont)
library(RColorBrewer)
library(scales)

#######################
#Treatment x Male vs Female Graphs 
#######################
a1 <- read.csv('Data.csv')

Treatment.Sex = ddply(a1,~Treatment*Sex,summarize,meanQ1=100*mean(Q1,na.rm=T)+
  Treatment.Sex$Treatment = factor(Treatment.Sex$Treatment,levels=c("CONTROL","A","B","C","D","E"))+

###Plot for Question Q1
ggplot(data=Treatment.Sex,aes(x=Treatment,y=meanQ1,fill=Sex))+
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),panel.background = element_blank(), axis.line = element_line(colour = "black"))+
  geom_bar(stat="identity",position=position_dodge(),colour="black")+
  geom_smooth(method='lm',color="BLACK")+
  scale_fill_manual(values=c("coral","cornflowerblue"))+
  xlab("Treatments")+
  ylab("Proportion of Correct Responses (%)")+
  theme(legend.title=element_blank())+
  ggtitle("Q1: Correct Responses by Sex")+
  theme(text=element_text(size=16))```

##############
Hi there!  Can anyone help with my buggy code to generate ggplot2 
I'm looking to create two versions of these plots: 
1) One graph - showing male and female sexes' responses to Question 1 on the one graph
2) A second example with two graphs- male  and female sexes' responses to Question 1 on separate graphs. 

[Data for GGPLOT2](https://drive.google.com/drive/folders/1bGFKI89dU1NpKIVM2HaXAmRznLHGz0yE?usp=sharing)
###################
-->

Can you post your data or a subset of it so we can run your ggplot code against the data you are working with? You can post the output of the dput() function. Use either

dput(Treatment.Sex)

or

dput(head(Treatment.Sex, 20))

Please put a line containing only three back ticks, ```, before and after the pasted output, like this:
```
Your output here
```

Hey thanks so much for that. I've just anonymized the data (I'll find out how to attach here?)

For the output - do you mean to cut and past the output in R Studio? Cheers, L

Yes, run the dput() function and copy and paste the output into a response here. Remember to put back ticks before and after the pasted output.

Hi there thanks for that. I've tried to run the dput () function - for output. But my code is so buggy it's not allowing me to do that. Does it work just to have the code and the data file link? Thanks so much!

If you can post the data somewhere, that will work too.

Oh great! I've linked that data to google drive too now if that works? Thanks. :slight_smile:
[https://drive.google.com/drive/folders/1bGFKI89dU1NpKIVM2HaXAmRznLHGz0yE?usp=sharing]

The link you provided is giving me an html page when I try to read it from R and if I go to the link with my browser, it says I do not have access.

HI sorry about that. It's a google drive link - I noticed on other threads people linked data in this way? I've changed the access now to viewable by anyone with a link. Would that work do you think? https://drive.google.com/drive/folders/1bGFKI89dU1NpKIVM2HaXAmRznLHGz0yE?usp=sharing

Here is a partial solution. The colors of the fit lines are not right but I think you know how to fix that. The main problem was with your attempt to summarize the data with ddply and chaining that to other functions with +. You cannot generally chain functions together like that. You can do that in ggplot2 but that is a special case. There is a pipe operator in magrittr, %>%, that can chain functions together and I used that and the functions from dplyr. I have not used ddply in a long time and I didn't want to wrestle with it.

a1 <- openxlsx::read.xlsx("~/R/Play/Data.xlsx")

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
Treatment.Sex = a1 %>% group_by(Treatment, Sex) %>% 
  summarize(meanQ1 = 100*mean(Q1,na.rm=T))
#> `summarise()` regrouping output by 'Treatment' (override with `.groups` argument)

Treatment.Sex$Treatment = factor(Treatment.Sex$Treatment,
                                 levels=c("CONTROL","A","B","C","D","E"))
                        
                        ###Plot for Question Q1
library(ggplot2)
ggplot(data=Treatment.Sex,aes(x=Treatment,y=meanQ1,fill=Sex))+
      theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),panel.background = element_blank(), axis.line = element_line(colour = "black"))+
      geom_bar(stat="identity",position=position_dodge(),colour="black")+
      geom_smooth(aes(group=Sex, color = Sex), method='lm',se=F)+
      scale_fill_manual(values=c("coral","cornflowerblue"))+
      xlab("Treatments")+
      ylab("Proportion of Correct Responses (%)")+
      theme(legend.title=element_blank())+
      ggtitle("Q1: Correct Responses by Sex")+
      theme(text=element_text(size=16))
#> `geom_smooth()` using formula 'y ~ x'

Created on 2021-02-17 by the reprex package (v0.3.0)

That is fantastic. Thank you so much!

Is dplyr a much more user friendly updated version of dpply? Good to know about chaining functions together as I had done. For the pipe operator how did you locate that and know of its existence ? Under ? help did you look for "Chaining functions"? Thank you so much - I'm now working on getting one graph for females and one for males. Great solution - trend lines easy fix as you say!

I think dplyr is easier to work with but use whatever works for you.

I can't say how or when I learned about the %>% operator. An excellent place to learn about using dplyr and the rest of the tidyverse in R for Data Science. There is a lot of information available about R but it is hard to find things if you don't know they exist. I frequently find new things and wish I had known about them earlier.