Multiple box plots

Hi,

I wish to create a multiple box plot for a large dataset, in which I want 11 separate boxplots in the same figure, all with the same variable for the y axis. The problem is that the variable to be used for the y axis is a string character of either "1" or "2" depending on if the values are related to good or poor survival. So I have managed to get separate boxplots, but they all contain only one single box with combined values for "1" and "2". Therefore I would appreciate advice on how to split the string character so that each of the boxplots get two separate boxes - one for good survival and one for poor survival.

Also, I would like to know if it is possible to put all 11 boxplots in the same plot eventually? E.g. lined up next to each other.

Thank you.

I do not understand how your data are structured. In particular, the y axis variable of a box plot should have continuous values, not two character values. Please make a Reproducible Example if the following code does not help you enough to solve your problem.

set.seed(10)
DF <- data.frame(Cat = sample(c("A", "B"), 50, replace = TRUE),
           Surv = sample(c("1", "2"), 50, replace = TRUE),
           Value = rnorm(50))
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 3.5.3
ggplot(DF, aes(x = Cat, y = Value, color = Surv)) + geom_boxplot()

Created on 2019-12-20 by the reprex package (v0.3.0.9000)

1 Like

Hi,

Thank you for your response. Although the input got me in the right direction, the result is still not exactly how I imagined it. Therefore, I will try to create a reproducible example:
Let´s say I have in total 100 cases, all with expression values for 5 different genes. Based on these values, the cases are assigned either survival group "1" (poor) or "2" (good). In total I have 6 columns: "Survival_Group_Numb", "Gene1", "Gene2", "Gene3", "Gene4", and "Gene5". PS: The expression values I work with have already been log2 transformed and do not need any further manipulation.

I want to create a multiple boxplot organised such as the one above, but I am struggling to adapt it to my particular dataset.

Hi, it seems like you need to tidy your data first. If you would tidy your Gene columns into one column "Gene", you could do the above method as provided by @FJCC

library(tidyverse)
library(readxl)

marte <- read_xlsx("marterstudio.xlsx")

marte <- gather(marte, "gene1", "gene2", "gene3", "gene4", "gene5", key="gene", value="value")
marte$group <- as.factor(marte$group)
marte$gene <- as.factor(marte$gene)

ggplot(marte, aes(x = gene, y = value, color = group)) +
  geom_boxplot()


This is what I did based on a recreation of the table that you posted. I hope this provides some insight.

Where the reprex is? I think you have forgotten to post it, after your example description.

This worked! Thank you very much.