Calculating a mean from a specific population within a data set


#1

Hello, I am very new to R. I would like to calculate a mean for a variable which is limited to a specific group of data which are defined by their values within another variable in the data set. How could I do this?

for example: "head_circ" in individuals whom are only in in "exp_group" 3

Thanks for any advice


#2

Please share a reproducible example, so we can better help you

reprex : What’s a repr oducible ex ample ( reprex for short) and how do I do one?


#3

mean(your_data$head_circ[your_data$exp_group == 3])


#4

Thats great, thanks. Is there also a way to perform the calculations even if there is missing data for some of the individuals. In this case case taking the mean of the available data?


#5

mean(your_data$head_circ[your_data$exp_group == 3], na.rm = TRUE)


#6

I have then tried to use the same principle to generate a 95% CI

confint(your_data$head_circ[your_data$exp_group == 3], level = 0.95, na.rm = TRUE)

I then got the following error: Error: $ operator is invalid for atomic vectors
How should I edit the code?


#7

confint()function is aplicable for a fitted model object, not a numeric vector. I think you are trying to get a confidence interval for your mean, for a simple approach you can use a normal distribution, something like this.

filtered_data <- your_data$head_circ[your_data$exp_group == 3]
m <- mean(filtered_data, na.rm = TRUE)
s <- sd(filtered_data, na.rm = TRUE)
n <- length(filtered_data)
error <- qnorm(0.975)*s/sqrt(n)
left <- m-error
right <- m+error

#8

Thanks thats excellent. When I then search for the outliers outside of this confidence interval there are exactly 10 individuals on either side of the confidence interval on analyses of 8 different parameters I have looked at using the following subset analysis, can this be correct?

subset(data_set, variable < left)
subset(data_set, variable > right)


#9

Are you recalculating the left and right limits for each parameter? Remember that they where calculated for the mean of head_circ where exp_group == 3 only.

On the other hand, this approach might not be theoretically correct, if your are just doing exploratory analysis, this is fine, but be careful if your goal is to draw some conclusion from this analysis.


closed #10

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.