All the columns are getting plotted in one box plot on the x-axis!

Meenakshi_Vengarai · July 26, 2021, 3:36am

Hello, I am new to coding in r. I have a dataset, from which I need to compare three columns to one column,
This is what my data looks like:
A B C D E
Yes No No 6 No
Yes No No 2 Yes
No No No 13 Yes
Yes No No 1 Yes
No No No 5 Yes

I need to compare A, B, C on the x-axis with D on the y-axis and E on the z- axis.
This is my code:

trial <- read.csv("figs.csv",
header= TRUE,
sep= ",")

new_data <- factor(trial$A)
as.numeric(new_data)
pl<-ggplot()+
stat_boxplot(data = trial, geom = "errorbar",
mapping = aes(x= new_data ,y= D))+
geom_boxplot(data= trial, aes(x = new_data , y = D, fill = A))+
geom_jitter()

nd2 <- factor(trial$B)
as.numeric(nd2)

t1 <- pl +
stat_boxplot(data= trial, geom = "errorbar",
mapping = aes(x = nd2 , y = D))+
geom_boxplot(data= trial, aes(x = nd2 , y = D, fill = B))+
geom_jitter(position = position_jitter(0.2))

nd3 <- factor(trial$Social)
as.numeric(nd3)
t2 <- t1 +
stat_boxplot(data= trial, geom = "errorbar",
mapping = aes(x = nd3 , y = D))+
geom_boxplot(data= trial, aes(x = nd3 , y = D, fill = C))+
geom_jitter(position = position_jitter(0.2))

The problem I am facing is that, the three columns A, B, C are getting plotted one on top of the other and not as separate boxes.

cactusoxbird · July 26, 2021, 5:14pm

Hi there,

I worked through your code and am a bit unsure what you're hoping for the final plot to look like, but I wonder if the following is at least a helpful starting point:

library(tidyverse)

# Reproducible df
ex_df <- tribble(
  ~A, ~B, ~C, ~D, ~E,
  "Yes", "No", "No", 6, "No",
  "Yes", "No", "No", 2, "Yes",
  "No", "No", "No", 13, "Yes",
  "Yes", "No", "No", 1, "Yes",
  "No", "No", "No", 5, "Yes"
)

# Pivot to long format to allow for paneling
long_df <- ex_df %>%
  pivot_longer(names_to = "x_vars", values_to = "x_values",
               c(A, B, C)) 

# Plot with panels
long_df %>%
  ggplot() +
  stat_boxplot(geom = "errorbar",
               mapping = aes(x = x_values, y = D)) +
  geom_boxplot(aes(x = x_values, y = D, fill = x_values)) +
  facet_wrap(vars(x_vars)) +
  xlab("Yes/No") +
  scale_fill_discrete("Yes/No")

^{Created on 2021-07-26 by the reprex package (v2.0.0)}

I'm not sure what the error bars should be showing. They don't seem right to me in this example, but perhaps you could clarify this and what you're hoping to do with a z-axis.

I used a "long" format data frame here to keep the variable relationships intact while plotting.

Meenakshi_Vengarai · July 27, 2021, 5:11am

@cactusoxbird thank you so much. The actual data has 80 observations in 6 columns. I realise, adding a z-axis is complicating the code further. I attempted comparing the 3 (A-C) "Yes", "No" columns to the numeric data column (D).

A - C represent whether a Treatment was given or not "Yes" / "No'
D represents the numberof tiimes a behavioral output was performed in response to getting or not getting the treatment.

Unforturnately when I plot using ggplot to create a boxplot, column A-C get plotted one on top of the other (added as attachement)

So my final result is a graph with yes and no on the x-axis, the behavioral frequency on the y-axis and with A,B,C box plots one on top of the other.

This is the most recent code I tried:

trial2 <- ggplot() +
stat_boxplot(data = trial, geom = "errorbar",
mapping = aes(x = A , y = D))

p1 <- trial2+
geom_boxplot(data= trial, aes(x = factor(A) , y = D, fill = A))+
geom_jitter()

p2<- p1+
stat_boxplot(data= trial, geom = "errorbar",
mapping = aes(x = B , y = D))
p3<- p2+
geom_boxplot(data= trial, aes(x = factor(B) , y = D, fill = B))+
geom_jitter()

p4<-p3+
stat_boxplot(data= trial, geom = "errorbar",
mapping = aes(x = C , y = D))
p5<- p4+
geom_boxplot(data= trial, aes(x = factor(C) , y = D, fill = C))+
geom_jitter()

Data:

NO.	A	B	. C.	. D	. E
42	Yes	No	No	6	No
12	Yes	No	No	2	Yes
43	No	No	No	13	Yes
4	Yes	No	No	1	Yes
44	No	No	No	5	Yes
42	No	No	No	1	Yes
12	No	No	No	6	Yes
43	Yes	No	No	3	Yes
4	No	No	No	0	Yes
44	Yes	No	No	9	Yes
30	Yes	No	No	3	Yes
38	No	No	No	5	Yes
58	Yes	No	No	1	Yes
46	Yes	No	No	3	No
31	No	No	No	12	Yes
30	No	No	No	4	Yes
38	Yes	No	No	8	Yes
58	No	No	No	7	Yes
46	No	No	No	5	Yes
31	Yes	No	No	2	Yes
30	No	No	Yes	4	No
38	Yes	No	Yes	6	Yes
58	Yes	No	Yes	4	No
46	Yes	No	Yes	2	Yes
31	No	No	Yes	6	Yes
30	Yes	No	Yes	3	Yes
38	No	No	Yes	3	Yes
58	No	No	Yes	4	Yes
46	No	No	Yes	5	Yes
31	Yes	No	Yes	7	No
42	No	No	Yes	3	No
12	Yes	No	Yes	10	Yes
43	No	No	Yes	9	Yes

I was hoping to create a final graph with "YES" , "No" as two variables on the x-axis and the three treated groups above the Yes , three untreated above "No" and the corresponding behavior frequency for both Yes and No on the Y-axis (Column D) . Would be grateful for any suggestions. I also tried tidying the data using "gather" but I think I am not using it correctly.

attaching what I am presently seeing here: Rplot1.pdf (16.8 KB)

cactusoxbird · July 28, 2021, 5:17pm

Ok, I think I understand what you're going for. Is it something like this?

To do this I would approach it similarly to my last post, using pivot_longer() to keep the treatment groups as a single column:

library(tidyverse)

# Build a snapshot of the dataset
trials <- tribble(
    ~NO, ~A, ~B, ~C, ~D, ~E,
    42, "Yes", "No", "No",6, "No",
    12, "Yes", "No", "No",2, "Yes",
    43, "No", "No", "No",13, "Yes",
    4, "Yes", "No", "No",1, "Yes",
    44, "No", "No", "No",5, "Yes",
    42, "No", "No", "No",1, "Yes",
    12, "No", "No", "No",6, "Yes",
    43, "Yes", "No", "No",3, "Yes",
    4, "No", "No", "No",0, "Yes",
    44, "Yes", "No", "No",9, "Yes",
    30, "Yes", "No", "No",3, "Yes",
    38, "No", "No", "No",5, "Yes",
    58, "Yes", "No", "No",1, "Yes",
    46, "Yes", "No", "No",3, "No",
    31, "No", "No", "No",12, "Yes",
    30, "No", "No", "No",4, "Yes",
    38, "Yes", "No", "No",8, "Yes",
    58, "No", "No", "No",7, "Yes",
    46, "No", "No", "No",5, "Yes",
    31, "Yes", "No", "No",2, "Yes",
    30, "No", "No", "Yes",4, "No",
    38, "Yes", "No", "Yes",6, "Yes",
    58, "Yes", "No", "Yes",4, "No",
    46, "Yes", "No", "Yes",2, "Yes",
    31, "No", "No", "Yes",6, "Yes",
    30, "Yes", "No", "Yes",3, "Yes",
    38, "No", "No", "Yes",3, "Yes",
    58, "No", "No", "Yes",4, "Yes",
    46, "No", "No", "Yes",5, "Yes",
    31, "Yes", "No", "Yes",7, "No",
    42, "No", "No", "Yes",3, "No",
    12, "Yes", "No", "Yes",10, "Yes",
    43, "No", "No", "Yes",9, "Yes"
    
  )


# Pivot to long format so that we can use separate colors for treatment groups
# and have all "Yes/No" boxplots in the same place
long_df <-  trials %>%
  pivot_longer(names_to = "trt_group", values_to = "treated",
               c(A, B, C))

# Now plot. Fill = trt_group changes the boxplot fill color based on the
# treatment group it was assigned. Yes/No is still the x-axis
long_df %>%
  ggplot() +
  geom_boxplot(aes(x = treated, y = D, fill = trt_group))

# You can use two lines of fake data to adjust the width of the boxplot so that
# the missing B group is still accoutned for. Taken from here:
# https://stackoverflow.com/questions/15367762/include-space-for-missing-factor-level-used-in-fill-aesthetics-in-geom-boxplot/15368879#15368879
long_df %>%
  add_row(D = c(50, 60),  E = c("Yes", "Yes"), trt_group = c("B", "C"),
          treated = c("Yes", "Yes")) %>%
  ggplot() +
  geom_boxplot(aes(x = treated, y = D, fill = trt_group)) +
  coord_cartesian(ylim = range(long_df$D) + c(-.25, .25))

# And finally you can customize the plot
long_df %>%
  add_row(D = c(50, 60),  E = c("Yes", "Yes"), trt_group = c("B", "C"),
          treated = c("Yes", "Yes")) %>%
  ggplot() +
  geom_boxplot(aes(x = treated, y = D, fill = trt_group)) +
  coord_cartesian(ylim = range(long_df$D) + c(-.25, .25)) +
  xlab("Received treatment") +
  ylab("Count of behavior observations") +
  scale_fill_discrete(name = "Treatment Group") +
  theme_bw()

^{Created on 2021-07-28 by the reprex package (v2.0.0)}

system · August 18, 2021, 5:18pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

NO.	A	B	. C.	. D	. E
42	Yes	No	No	6	No
12	Yes	No	No	2	Yes
43	No	No	No	13	Yes
4	Yes	No	No	1	Yes
44	No	No	No	5	Yes
42	No	No	No	1	Yes
12	No	No	No	6	Yes
43	Yes	No	No	3	Yes
4	No	No	No	0	Yes
44	Yes	No	No	9	Yes
30	Yes	No	No	3	Yes
38	No	No	No	5	Yes
58	Yes	No	No	1	Yes
46	Yes	No	No	3	No
31	No	No	No	12	Yes
30	No	No	No	4	Yes
38	Yes	No	No	8	Yes
58	No	No	No	7	Yes
46	No	No	No	5	Yes
31	Yes	No	No	2	Yes
30	No	No	Yes	4	No
38	Yes	No	Yes	6	Yes
58	Yes	No	Yes	4	No
46	Yes	No	Yes	2	Yes
31	No	No	Yes	6	Yes
30	Yes	No	Yes	3	Yes
38	No	No	Yes	3	Yes
58	No	No	Yes	4	Yes
46	No	No	Yes	5	Yes
31	Yes	No	Yes	7	No
42	No	No	Yes	3	No
12	Yes	No	Yes	10	Yes
43	No	No	Yes	9	Yes

NO.	A	B	. C.	. D	. E
42	Yes	No	No	6	No
12	Yes	No	No	2	Yes
43	No	No	No	13	Yes
4	Yes	No	No	1	Yes
44	No	No	No	5	Yes
42	No	No	No	1	Yes
12	No	No	No	6	Yes
43	Yes	No	No	3	Yes
4	No	No	No	0	Yes
44	Yes	No	No	9	Yes
30	Yes	No	No	3	Yes
38	No	No	No	5	Yes
58	Yes	No	No	1	Yes
46	Yes	No	No	3	No
31	No	No	No	12	Yes
30	No	No	No	4	Yes
38	Yes	No	No	8	Yes
58	No	No	No	7	Yes
46	No	No	No	5	Yes
31	Yes	No	No	2	Yes
30	No	No	Yes	4	No
38	Yes	No	Yes	6	Yes
58	Yes	No	Yes	4	No
46	Yes	No	Yes	2	Yes
31	No	No	Yes	6	Yes
30	Yes	No	Yes	3	Yes
38	No	No	Yes	3	Yes
58	No	No	Yes	4	Yes
46	No	No	Yes	5	Yes
31	Yes	No	Yes	7	No
42	No	No	Yes	3	No
12	Yes	No	Yes	10	Yes
43	No	No	Yes	9	Yes

NO.	A	B	. C.	. D	. E
42	Yes	No	No	6	No
12	Yes	No	No	2	Yes
43	No	No	No	13	Yes
4	Yes	No	No	1	Yes
44	No	No	No	5	Yes
42	No	No	No	1	Yes
12	No	No	No	6	Yes
43	Yes	No	No	3	Yes
4	No	No	No	0	Yes
44	Yes	No	No	9	Yes
30	Yes	No	No	3	Yes
38	No	No	No	5	Yes
58	Yes	No	No	1	Yes
46	Yes	No	No	3	No
31	No	No	No	12	Yes
30	No	No	No	4	Yes
38	Yes	No	No	8	Yes
58	No	No	No	7	Yes
46	No	No	No	5	Yes
31	Yes	No	No	2	Yes
30	No	No	Yes	4	No
38	Yes	No	Yes	6	Yes
58	Yes	No	Yes	4	No
46	Yes	No	Yes	2	Yes
31	No	No	Yes	6	Yes
30	Yes	No	Yes	3	Yes
38	No	No	Yes	3	Yes
58	No	No	Yes	4	Yes
46	No	No	Yes	5	Yes
31	Yes	No	Yes	7	No
42	No	No	Yes	3	No
12	Yes	No	Yes	10	Yes
43	No	No	Yes	9	Yes