A line to connect the mean (or median) for box-lot

Dear Friends,
I have a boxplot showing multiple boxes. I want a line to connect the mean (or median) for each box together with a line, like this example:

Schermata 2020-06-27 alle 13.09.11

Is there a particular string? Is there a package that I need to complete my analysis? Thanks!

Here is an example of one way to do this with ggplot and dplyr.

library(dplyr)
library(ggplot2)
DF <- data.frame(X = rep(1:4, each = 20),
                 Y = c(rnorm(20, 78, 1), rnorm(20, 76, 1),
                       rnorm(20, 80, 0.5), rnorm(20, 74, 2)))
Means <- DF %>% group_by(X) %>% 
  summarize(Avg = mean(Y))
ggplot() + 
  geom_boxplot(data = DF, mapping = aes(x = X, y = Y, group = X)) +
  geom_point(data = Means, mapping = aes(x = X, y = Avg)) +
  geom_line(data = Means, mapping = aes(x = X, y = Avg))

Created on 2020-06-27 by the reprex package (v0.3.0)

Thanks for your message! But... I don't understand how can I report the script in my analysis?
In my hospital dataset, I have the height of my 40 patients divided for 20 years - columns, so I want realize twenty boxplot connected by a mean (or median) line on the plots.
How can I write my script? Thanks for your help!!!

I am not sure how our data are arranged. Do you have one column for each boxplot that you want to make? The most efficient way for me to help you is if you post some data. You do not need to show all of your data; a few rows and columns is often enough. If your data are in a data frame called DF, you can show the content of the first 10 rows and the first 4 columns with

dput(DF[1:10, 1:4])

Post the output of a command like that and it will be easier to help you. You may not want to use the first four columns, use whatever makes sense with your data.

Thanks for your help. I report the data (height of the object of our study), as you suggested:

> dput(DATASET_PAS[1:4, 1:11])
structure(list(Baseline = c(26L, 24L, 30L, 24L), X.2nd.Year. = c(26.5, 
24.5, 30.5, 24.5), X.4th.Year. = c(30L, 30L, 34L, 30L), X.6th.Year. = c(30L, 
30L, 34L, 30L), X.8th.Year. = c(30L, 31L, 34L, 31L), X.10nd.Year. = c(30L, 
31L, 34L, 31L), X.12th.Year. = c(30L, 32L, 34L, 32L), X.14th.Year. = c(30L, 
32L, 34L, 32L), X.16th.Year. = c(30L, 32L, 34L, 32L), X.18th.Year. = c(30L, 
32L, 34L, 32L), X.20th.Year. = c(30L, 32L, 34L, 32L)), row.names = c(NA, 
4L), class = "data.frame")

So, I have one column for each boxplot and I want a line to connect the mean (or median) for our study.

Why do the nome of column change after import dataset? I found X before Column Name!

Thanks for your help!

The X is added to the column names because column names should not start with a number. They also should not contain spaces, which may account for the . characters.
I reshaped the data to have one column for the former column names and another for the values. I also adjusted the column names after the reshaping to drop the X and . characters.

library(dplyr)
library(ggplot2)
library(tidyr)
library(stringr)

DF <- structure(list(Baseline = c(26L, 24L, 30L, 24L), 
               X.2nd.Year. = c(26.5,24.5, 30.5, 24.5), 
               X.4th.Year. = c(30L, 30L, 34L, 30L), 
               X.6th.Year. = c(30L,30L, 34L, 30L), 
               X.8th.Year. = c(30L, 31L, 34L, 31L), 
               X.10nd.Year. = c(30L, 31L, 34L, 31L), 
               X.12th.Year. = c(30L, 32L, 34L, 32L), 
               X.14th.Year. = c(30L, 32L, 34L, 32L), 
               X.16th.Year. = c(30L, 32L, 34L, 32L), 
               X.18th.Year. = c(30L,32L, 34L, 32L), 
               X.20th.Year. = c(30L, 32L, 34L, 32L)), 
          row.names = c(NA,4L), class = "data.frame")
DFlong <- pivot_longer(data = DF, cols = Baseline:X.20th.Year., 
                       names_to = "Year", values_to = "Value")
#Clean up the Year column
DFlong <- mutate(DFlong, Year = str_remove_all(Year, "X|\\."),
                 Year = str_replace(Year, "Y", " Y"),
                 Year = factor(Year, levels = c("Baseline", "2nd Year", "4th Year",
                               "6th Year", "8th Year", "10nd Year", "12th Year",
                               "14th Year", "16th Year", "18th Year", "20th Year")))

Means <- DFlong %>% group_by(Year) %>% 
  summarize(Avg = mean(Value))

ggplot(DFlong) + geom_boxplot(mapping = aes(Year, Value), data = DFlong) +
  geom_point(mapping = aes(Year, Avg), data = Means, color = "red", shape = 2) +
  geom_line(mapping = aes(Year, Avg, group = 1), data = Means)

Created on 2020-06-30 by the reprex package (v0.3.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.