Averaging multiple non-normal distributions

Hello,

I have multiple non-normal distribution plots (the same variable from different subjects) and I am trying to get an average distribution within the cohort

I'm not sure the best way to go about this...but this is what I've been thinking of doing with the density plots I have for each subject.

A. Get the function of the curve of the distribution and then average all the functions together

B. Get the y coordinate for each x, then average at each X by the number of subjects... (x1 + x2 + x3)/3

I'm not sure if what I'm saying makes sense, but I'm very new at R - and can't even figure out how to get the function of my ggplot2 density plot

Any help will be much appreciated!

I am not sure exactly what you are trying to do but here is an example of using the density() function to get the same density values that geom_density produces. The density function has arguments for n, from and to if you want to tune the span of the calculation.

library(ggplot2)
set.seed(1)
x <- rnorm(100, 2, 0.5)
DensX <- density(x)
ggplot() + geom_histogram(mapping = aes(x =  x, y = ..density..), color = "white") +
  geom_line(mapping = aes(x = DensX$x, y = DensX$y), color = "green", size = 2) +
  geom_density(mapping = aes(x = x), color = "red", size = 1)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Created on 2020-03-10 by the reprex package (v0.3.0)

@FJCC thank you!!! But what I want to do is get the equation of the actual density function line ... y = ax+b kind of thing

Because I have a density plot for each of my subjects - but I want to combine and make one representative density plot (by averaging the density plots)

Perhaps you can use a spline fit to generate predicted densities at arbitrary x values as in the following code.

library(ggplot2)
set.seed(1)
x <- rnorm(100, 2, 0.5)
DensX <- density(x)
DensFunc <- splinefun(DensX$x, DensX$y)
FitValues <- data.frame(NewX = seq(1,3,0.1), NewY = DensFunc(seq(1,3,0.1)))
ggplot() + geom_histogram(mapping = aes(x =  x, y = ..density..), color = "white") +
  geom_point(mapping = aes(x = NewX, y = NewY),data = FitValues,  size = 3) +
  geom_density(mapping = aes(x = x), color = "red", size = 1)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Created on 2020-03-10 by the reprex package (v0.3.0)

And then do you know if there's a way to get a read-out of the generated points?

The generated values are in the data frame FitValues. You can make a data frame for each distribution using the same x values in every case and then bind the data frames together with rbind() and calculate averages at each x value.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.