Confused with density charts


I am trying to plot a density chart for two lifts, small and a large and they hold the data of the median weight of 5 people in the small lift, and median weight of 10 people in the large lift, however, I am confused on how to plot the density chart so i can compare the median weight of both the lifts

The dataset that I am using is here;

However, i deleted the height and the index as that is irrelevant for my research

I created a two samples which are here;
Large <- replicate(n=1000, mean(sample(Weight$Weight, size = 10)))
Small <- replicate(n=1000, mean(sample(Weight$Weight, size = 10)))

and then I put them into one dataframe called Lifts;
Lifts<-data.frame(Large, Small)

I have tried to come up with a solution myself by plotting the following density chart;
ggplot(Lifts, aes(Large)) + geom_density(fill = "blue") + labs(x = "Median of Weight", y = "Distribution of Data")
Any help would be appreciated

Hello, it seems you are close to find your own solution, because I see you plotted one of the distributions. To make it more convenient for others here to help you consider sharing your data, or better a small sample of it, and the code you have so far, so that the community members here can start from something and get you an answer with a little less effort.
dput() is a useful function for making a small R object copy and pastable through the forum.

Ok ill edit my original post!

1 Like

Your data is in 'wide' format, but ggplot works best with data in 'long' format (also called 'tidy' format).
It is entirely reasonable to be a little confused over this when first coming to ggplot. Still, there are many advantages to using a long format when it comes to using ggplot.

The general approach to this type of issue is to transform the data before you send it to ggplot.

In the suggestion below, I create some sample data that mimicks the data you have linked to. I then transform the data to a long format and plot it using ggplot.

In the end, there is a solution without transforming the data. It will likely appear more straightforward as first. Still, I recommend that you use the first solution, as it will probably work better for you as your project becomes more complicated.



# Sample Data, generated to mimic the data you linked to
Lifts <-tibble(
  Large = rnorm(500, mean = 105.872, sd = 10.074),
  Small = rnorm(500, mean = 105.506, sd = 10.227))

# This is what the sample data looks like
#> # A tibble: 6 x 2
#>   Large Small
#>   <dbl> <dbl>
#> 1  99.6 106. 
#> 2 108.  102. 
#> 3  97.5  93.4
#> 4 122.  106. 
#> 5 109.  116. 
#> 6  97.6 122.

# Tranform lifts into long format
Lifts_long_format <- Lifts %>% 
  pivot_longer(cols = everything(), names_to = "size")

# This is the structure
#> # A tibble: 6 x 2
#>   size  value
#>   <chr> <dbl>
#> 1 Large  99.6
#> 2 Small 106. 
#> 3 Large 108. 
#> 4 Small 102. 
#> 5 Large  97.5
#> 6 Small  93.4

ggplot(Lifts_long_format, aes(value, fill = size))+
  geom_density(alpha = 0.5) + 
  labs(x = "Median of Weight", y = "Distribution of Data")

# You can also do it without transforming the data
  geom_density(aes(Large), fill = "blue", alpha = 0.5) + 
  geom_density(aes(Small), fill = "red", alpha = 0.5) + 
  labs(x = "Median of Weight", y = "Distribution of Data")

Created on 2020-02-20 by the reprex package (v0.3.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.