Greate a plot with CI in ggplot2 for population proportion

I am writing this to if it is possible or not to visualize CI for population proportion using ggplot2 directly from data without creating a data frame.

I compared survival of two Walleye strains in southern Minnesota lakes stocked at a 1:1 ratio. Later, we captured fish and estimated the proportion of each strain for each lake accounting for age. The graph and code I created was attached below. I was wondering if it is possible to create similar graphs from raw data without creating a new data frame for each year.

I know you must be very busy but I truly appreciate your feedback.

# Create a graph for LMS proportion with 95% confidence intervals

Stocked2018LMSCI<-data.frame(Strain=rep(c("LMS"), each=1),
                             AgeClass=rep(c("Age0", "Age1", "Age2"), each=6),
                             Populations=rep(c("Bingham", "Kansas", "Okabena", "Round", "Cannon", "Tetonka"), each=1),
                             LowerCI=c(0.02, 0.86, 0.74, 0.03, 0.79, 0.73, 0.06, 0.00, 0.85, 0.09, 0.32, 0.66, 0.09, 0.70, 0.00, 0.00, 0.32, 0.55),
                             UpperCI=c(0.39, 0.95, 0.89, 0.56, 0.89, 0.97, 0.38, 0.00, 1.00, 0.91, 0.82, 1.00, 0.91, 1.00, 0.79, 0.79, 0.72, 0.95),
                             Proportion=c(0.13, 0.92, 0.82, 0.17, 0.85, 0.90, 0.16, 0.00, 1.00, 0.33, 0.62, 1.00, 0.5, 1.00, 0.00, 0.00, 0.52, 0.83))
Stocked2018LMSCI


ggplot(Stocked2018LMSCI, aes(x=Populations, y=Proportion))+
  xlab(NULL)+
  geom_pointrange(aes(ymin=LowerCI, ymax=UpperCI, color=AgeClass),
                  width=0.2,
                  position=position_dodge(width = 0.1), size=0.5)+
  geom_hline(aes(yintercept=0.5), linetype="dashed")+
  theme_classic()+
  scale_x_discrete(limits=c("Cannon", "Tetonka", "Bingham", "Kansas", "Okabena", "Round"))

# Shape
ggplot(Stocked2018LMSCI, aes(x=Populations, y=Proportion))+
  xlab(NULL)+
  geom_pointrange(aes(ymin=LowerCI, ymax=UpperCI, shape=AgeClass),
                  width=0.2,
                  position=position_dodge(width = 0.5), size=0.5)+
  geom_hline(aes(yintercept=0.5), linetype="dashed")+
  scale_shape_manual(values = c(0:2)) + scale_fill_grey() + theme_classic()+
  scale_x_discrete(limits=c("Cannon", "Tetonka", "Bingham", "Kansas", "Okabena", "Round"))

Hi @Askhan_Shametov
Welcome to the RStudio Community Forum.

The answer to your question is almost certainly "Yes" but we cannot show you how without having access to a representative sample of your raw data. Please see the Posting Guide which shows you how to make a Reproducible Example.

Would something like this work?

# Load libraries ----------------------------------------------------------
library("tidyverse")


# Define example data -----------------------------------------------------
set.seed(282678)
my_data <- tibble(
  Populations = rep(
    c("Bingham", "Kansas", "Okabena", "Round", "Cannon", "Tetonka"),
    each = 20),
  AgeClass = sample(c("Age0", "Age1", "Age2"),
                    size = 120,
                    replace = TRUE),
  Proportion = rnorm(120,
                     mean = 0.5,
                     sd = 0.1)
)


# Wrangle data ------------------------------------------------------------
my_data_summary <- my_data %>% 
  group_by(Populations, AgeClass) %>% 
  summarise(Proportion_mu = mean(Proportion),
            ci_lower = t.test(Proportion)$conf.int[1],
            ci_upper = t.test(Proportion)$conf.int[2])


# Visualise data ----------------------------------------------------------
my_data_summary %>% 
  ggplot(aes(x = Populations,
             y = Proportion_mu,
             ymin = ci_lower,
             ymax = ci_upper,
             shape = AgeClass)) +
  geom_hline(yintercept = 0.5,
             linetype = "dashed") +
  geom_pointrange(position = position_dodge(width = 0.5),
                  size = 0.5)

Hope it helps :slightly_smiling_face:

Hi David,

Thank you for you reply.

I tried to create a representative sample of my raw data according to this link (FAQ: How to do a minimal reproducible example ( reprex ) for beginners). However, I was not able to do.
My raw data has 2068 observations of 6 variables. Is there anyway I could demonstrate it here?

Thank you!

Dear Leon,

Thank you for the help. It worked, but the output was different from the original one. I did not understand why so? Is there anything I might miss or misunderstand?

Thanks,

Han