All the columns are getting plotted in one box plot on the x-axis!

Hello, I am new to coding in r. I have a dataset, from which I need to compare three columns to one column,
This is what my data looks like:
A B C D E
Yes No No 6 No
Yes No No 2 Yes
No No No 13 Yes
Yes No No 1 Yes
No No No 5 Yes

I need to compare A, B, C on the x-axis with D on the y-axis and E on the z- axis.
This is my code:

trial <- read.csv("figs.csv",
header= TRUE,
sep= ",")

new_data <- factor(trial$A)
as.numeric(new_data)
pl<-ggplot()+
stat_boxplot(data = trial, geom = "errorbar",
mapping = aes(x= new_data ,y= D))+
geom_boxplot(data= trial, aes(x = new_data , y = D, fill = A))+
geom_jitter()

nd2 <- factor(trial$B)
as.numeric(nd2)

t1 <- pl +
stat_boxplot(data= trial, geom = "errorbar",
mapping = aes(x = nd2 , y = D))+
geom_boxplot(data= trial, aes(x = nd2 , y = D, fill = B))+
geom_jitter(position = position_jitter(0.2))

nd3 <- factor(trial$Social)
as.numeric(nd3)
t2 <- t1 +
stat_boxplot(data= trial, geom = "errorbar",
mapping = aes(x = nd3 , y = D))+
geom_boxplot(data= trial, aes(x = nd3 , y = D, fill = C))+
geom_jitter(position = position_jitter(0.2))

The problem I am facing is that, the three columns A, B, C are getting plotted one on top of the other and not as separate boxes.

Hi there,

I worked through your code and am a bit unsure what you're hoping for the final plot to look like, but I wonder if the following is at least a helpful starting point:

library(tidyverse)

# Reproducible df
ex_df <- tribble(
  ~A, ~B, ~C, ~D, ~E,
  "Yes", "No", "No", 6, "No",
  "Yes", "No", "No", 2, "Yes",
  "No", "No", "No", 13, "Yes",
  "Yes", "No", "No", 1, "Yes",
  "No", "No", "No", 5, "Yes"
)

# Pivot to long format to allow for paneling
long_df <- ex_df %>%
  pivot_longer(names_to = "x_vars", values_to = "x_values",
               c(A, B, C)) 

# Plot with panels
long_df %>%
  ggplot() +
  stat_boxplot(geom = "errorbar",
               mapping = aes(x = x_values, y = D)) +
  geom_boxplot(aes(x = x_values, y = D, fill = x_values)) +
  facet_wrap(vars(x_vars)) +
  xlab("Yes/No") +
  scale_fill_discrete("Yes/No")

Created on 2021-07-26 by the reprex package (v2.0.0)

I'm not sure what the error bars should be showing. They don't seem right to me in this example, but perhaps you could clarify this and what you're hoping to do with a z-axis.

I used a "long" format data frame here to keep the variable relationships intact while plotting.

@cactusoxbird thank you so much. The actual data has 80 observations in 6 columns. I realise, adding a z-axis is complicating the code further. I attempted comparing the 3 (A-C) "Yes", "No" columns to the numeric data column (D).

  1. A - C represent whether a Treatment was given or not "Yes" / "No'
  2. D represents the numberof tiimes a behavioral output was performed in response to getting or not getting the treatment.

Unforturnately when I plot using ggplot to create a boxplot, column A-C get plotted one on top of the other (added as attachement)

So my final result is a graph with yes and no on the x-axis, the behavioral frequency on the y-axis and with A,B,C box plots one on top of the other.

This is the most recent code I tried:

trial2 <- ggplot() +
stat_boxplot(data = trial, geom = "errorbar",
mapping = aes(x = A , y = D))

p1 <- trial2+
geom_boxplot(data= trial, aes(x = factor(A) , y = D, fill = A))+
geom_jitter()

p2<- p1+
stat_boxplot(data= trial, geom = "errorbar",
mapping = aes(x = B , y = D))
p3<- p2+
geom_boxplot(data= trial, aes(x = factor(B) , y = D, fill = B))+
geom_jitter()

p4<-p3+
stat_boxplot(data= trial, geom = "errorbar",
mapping = aes(x = C , y = D))
p5<- p4+
geom_boxplot(data= trial, aes(x = factor(C) , y = D, fill = C))+
geom_jitter()

Data:

NO. A B . C. . D . E
42 Yes No No 6 No
12 Yes No No 2 Yes
43 No No No 13 Yes
4 Yes No No 1 Yes
44 No No No 5 Yes
42 No No No 1 Yes
12 No No No 6 Yes
43 Yes No No 3 Yes
4 No No No 0 Yes
44 Yes No No 9 Yes
30 Yes No No 3 Yes
38 No No No 5 Yes
58 Yes No No 1 Yes
46 Yes No No 3 No
31 No No No 12 Yes
30 No No No 4 Yes
38 Yes No No 8 Yes
58 No No No 7 Yes
46 No No No 5 Yes
31 Yes No No 2 Yes
30 No No Yes 4 No
38 Yes No Yes 6 Yes
58 Yes No Yes 4 No
46 Yes No Yes 2 Yes
31 No No Yes 6 Yes
30 Yes No Yes 3 Yes
38 No No Yes 3 Yes
58 No No Yes 4 Yes
46 No No Yes 5 Yes
31 Yes No Yes 7 No
42 No No Yes 3 No
12 Yes No Yes 10 Yes
43 No No Yes 9 Yes

I was hoping to create a final graph with "YES" , "No" as two variables on the x-axis and the three treated groups above the Yes , three untreated above "No" and the corresponding behavior frequency for both Yes and No on the Y-axis (Column D) . Would be grateful for any suggestions. I also tried tidying the data using "gather" but I think I am not using it correctly.

attaching what I am presently seeing here: Rplot1.pdf (16.8 KB)

Ok, I think I understand what you're going for. Is it something like this?

image

To do this I would approach it similarly to my last post, using pivot_longer() to keep the treatment groups as a single column:

library(tidyverse)

# Build a snapshot of the dataset
trials <- tribble(
    ~NO, ~A, ~B, ~C, ~D, ~E,
    42, "Yes", "No", "No",6, "No",
    12, "Yes", "No", "No",2, "Yes",
    43, "No", "No", "No",13, "Yes",
    4, "Yes", "No", "No",1, "Yes",
    44, "No", "No", "No",5, "Yes",
    42, "No", "No", "No",1, "Yes",
    12, "No", "No", "No",6, "Yes",
    43, "Yes", "No", "No",3, "Yes",
    4, "No", "No", "No",0, "Yes",
    44, "Yes", "No", "No",9, "Yes",
    30, "Yes", "No", "No",3, "Yes",
    38, "No", "No", "No",5, "Yes",
    58, "Yes", "No", "No",1, "Yes",
    46, "Yes", "No", "No",3, "No",
    31, "No", "No", "No",12, "Yes",
    30, "No", "No", "No",4, "Yes",
    38, "Yes", "No", "No",8, "Yes",
    58, "No", "No", "No",7, "Yes",
    46, "No", "No", "No",5, "Yes",
    31, "Yes", "No", "No",2, "Yes",
    30, "No", "No", "Yes",4, "No",
    38, "Yes", "No", "Yes",6, "Yes",
    58, "Yes", "No", "Yes",4, "No",
    46, "Yes", "No", "Yes",2, "Yes",
    31, "No", "No", "Yes",6, "Yes",
    30, "Yes", "No", "Yes",3, "Yes",
    38, "No", "No", "Yes",3, "Yes",
    58, "No", "No", "Yes",4, "Yes",
    46, "No", "No", "Yes",5, "Yes",
    31, "Yes", "No", "Yes",7, "No",
    42, "No", "No", "Yes",3, "No",
    12, "Yes", "No", "Yes",10, "Yes",
    43, "No", "No", "Yes",9, "Yes"
    
  )


# Pivot to long format so that we can use separate colors for treatment groups
# and have all "Yes/No" boxplots in the same place
long_df <-  trials %>%
  pivot_longer(names_to = "trt_group", values_to = "treated",
               c(A, B, C))

# Now plot. Fill = trt_group changes the boxplot fill color based on the
# treatment group it was assigned. Yes/No is still the x-axis
long_df %>%
  ggplot() +
  geom_boxplot(aes(x = treated, y = D, fill = trt_group))

# You can use two lines of fake data to adjust the width of the boxplot so that
# the missing B group is still accoutned for. Taken from here:
# https://stackoverflow.com/questions/15367762/include-space-for-missing-factor-level-used-in-fill-aesthetics-in-geom-boxplot/15368879#15368879
long_df %>%
  add_row(D = c(50, 60),  E = c("Yes", "Yes"), trt_group = c("B", "C"),
          treated = c("Yes", "Yes")) %>%
  ggplot() +
  geom_boxplot(aes(x = treated, y = D, fill = trt_group)) +
  coord_cartesian(ylim = range(long_df$D) + c(-.25, .25))

# And finally you can customize the plot
long_df %>%
  add_row(D = c(50, 60),  E = c("Yes", "Yes"), trt_group = c("B", "C"),
          treated = c("Yes", "Yes")) %>%
  ggplot() +
  geom_boxplot(aes(x = treated, y = D, fill = trt_group)) +
  coord_cartesian(ylim = range(long_df$D) + c(-.25, .25)) +
  xlab("Received treatment") +
  ylab("Count of behavior observations") +
  scale_fill_discrete(name = "Treatment Group") +
  theme_bw()

Created on 2021-07-28 by the reprex package (v2.0.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.