I am trying to plot a density chart for two lifts, small and a large and they hold the data of the median weight of 5 people in the small lift, and median weight of 10 people in the large lift, however, I am confused on how to plot the density chart so i can compare the median weight of both the lifts
The dataset that I am using is here;
However, i deleted the height and the index as that is irrelevant for my research
I created a two samples which are here;
Large <- replicate(n=1000, mean(sample(Weight$Weight, size = 10)))
Small <- replicate(n=1000, mean(sample(Weight$Weight, size = 10)))
and then I put them into one dataframe called Lifts;
Lifts<-data.frame(Large, Small)
I have tried to come up with a solution myself by plotting the following density chart;
ggplot(Lifts, aes(Large)) + geom_density(fill = "blue") + labs(x = "Median of Weight", y = "Distribution of Data")
Any help would be appreciated
Hello, it seems you are close to find your own solution, because I see you plotted one of the distributions. To make it more convenient for others here to help you consider sharing your data, or better a small sample of it, and the code you have so far, so that the community members here can start from something and get you an answer with a little less effort.
dput() is a useful function for making a small R object copy and pastable through the forum.
Your data is in 'wide' format, but ggplot works best with data in 'long' format (also called 'tidy' format).
It is entirely reasonable to be a little confused over this when first coming to ggplot. Still, there are many advantages to using a long format when it comes to using ggplot.
The general approach to this type of issue is to transform the data before you send it to ggplot.
In the suggestion below, I create some sample data that mimicks the data you have linked to. I then transform the data to a long format and plot it using ggplot.
In the end, there is a solution without transforming the data. It will likely appear more straightforward as first. Still, I recommend that you use the first solution, as it will probably work better for you as your project becomes more complicated.
library(tidyverse)
set.seed(1)
# Sample Data, generated to mimic the data you linked to
Lifts <-tibble(
Large = rnorm(500, mean = 105.872, sd = 10.074),
Small = rnorm(500, mean = 105.506, sd = 10.227))
# This is what the sample data looks like
head(Lifts)
#> # A tibble: 6 x 2
#> Large Small
#> <dbl> <dbl>
#> 1 99.6 106.
#> 2 108. 102.
#> 3 97.5 93.4
#> 4 122. 106.
#> 5 109. 116.
#> 6 97.6 122.
# Tranform lifts into long format
Lifts_long_format <- Lifts %>%
pivot_longer(cols = everything(), names_to = "size")
# This is the structure
head(Lifts_long_format)
#> # A tibble: 6 x 2
#> size value
#> <chr> <dbl>
#> 1 Large 99.6
#> 2 Small 106.
#> 3 Large 108.
#> 4 Small 102.
#> 5 Large 97.5
#> 6 Small 93.4
ggplot(Lifts_long_format, aes(value, fill = size))+
geom_density(alpha = 0.5) +
labs(x = "Median of Weight", y = "Distribution of Data")
# You can also do it without transforming the data
ggplot(Lifts)+
geom_density(aes(Large), fill = "blue", alpha = 0.5) +
geom_density(aes(Small), fill = "red", alpha = 0.5) +
labs(x = "Median of Weight", y = "Distribution of Data")