Calculate mean of one column based on group ids in another

Shubh · September 15, 2019, 12:01pm

There is a table with 3 columns

Id.    Food.     Sex
1.     23.       0
2.     46.       1
3.     99.       1
4.     37.       0
5.     12.       1

Now how do i get mean of food whoose sex is marked as 1?

I have applied if(data$sex == 1)
To check the conditions but how do i select food column to get mean of food with the help of sex column

JohnMount · September 15, 2019, 2:17pm

Try the aggregate command (help(aggregate) will have some documentation). For your data the command would look like the following.


library(wrapr)

d <- wrapr::build_frame(
  "id"  , "food", "sex" |
    1   , 23    , 0     |
    2   , 46    , 1     |
    3   , 99    , 1     |
    4   , 37    , 0     |
    5   , 12    , 1     )

aggregate(food ~ sex, data = d, FUN = mean)

#   sex     food
# 1   0 30.00000
# 2   1 52.33333

jcblum · September 15, 2019, 4:52pm

Hi @Shubh! If you’re working on this problem for a course assignment, it’s good to know that there are several different ways to accomplish this goal in R. The best one for you will depend on what topics you’ve been learning recently that the assignment is meant to reinforce, so it might help if you thought about which parts of R and its packages you’ve been learning about and described which tools you think might be helpful to solve the question.

valeri · September 15, 2019, 6:30pm

Many possibilities here (2 listed below):

First:

mean(subset(data, Sex == 1)$Food)

Second:

library(dplyr)
data %>% filter(Sex == 1) %>% pull(Food) %>% mean()

andresrcs · September 15, 2019, 8:36pm

In case you are learning basic vector subsetting, here is an example.

data <- data.frame(
    Id. = c(1, 2, 3, 4, 5),
    Food. = c(23, 46, 99, 37, 12),
    Sex = c(0, 1, 1, 0, 1)
)

# Generate logical vector checking for the condition
data$Sex == 1
#> [1] FALSE  TRUE  TRUE FALSE  TRUE

# Select positions where the logical condition is TRUE
data$Food[data$Sex == 1]
#> [1] 46 99 12

# Calculate the mean only for the selected values
mean(data$Food[data$Sex == 1])
#> [1] 52.33333

system · October 6, 2019, 8:36pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.