Calculating means for variables with levels

erinho · September 13, 2021, 2:40pm

I have beginner level question that is simple yet so difficult. I have a variable "Position" and it has two levels, "manager" and "employee". I want to calculate the means of employees and managers for survey questions (Q1 in this example). My code below doesn't work at all. So how to do this smartly? Thanks for your help.

mydata %>% 
  select(Q1, Position == "manager") %>% 
  mutate(Q1_mean_manager = mean(Q1, na.rm = TRUE))

Sample data

structure(list(Unit = c(2, 10, 10, 4, 2, 10, 5, 3, 9, 5), Position = structure(c(2L, 
2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L), .Label = c("manager", "employee"
), class = "factor"), `Work experience` = c(2, 3, 3, 4, 4, 4, 
4, 4, 4, 4), `Work experience at current job` = c(2, 2, 3, 2, 
4, 1, 4, 4, 4, 4), Q1 = c(5, 4, 3, 6, 4, 5, 2, 4, 5, 6), Q2 = c(5, 
4, 2, 7, 5, 6, 5, 6, 7, 5), Q3 = c(4, 4, 7, 3, 6, 5, 6, 6, 7, 
5), Q4 = c(5, 5, 2, 5, 4, 5, 2, 5, 5, 5), Q5 = c(NA, 4, 3, 4, 
3, 6, NA, 5, NA, NA), Q6 = c(NA, 4, 4, 4, 5, 4, 2, 5, 5, 5), 
    Q7 = c(NA, 4, 4, 4, 5, 5, 4, 5, 5, 5), Q8 = c(6, 4, 2, 3, 
    4, 4, 3, 5, 5, 5), Q9 = c(6, 5, 5, 7, 5, 7, 4, 6, 5, 6), 
    Q10 = c(5, 3, 2, 6, 3, 7, 1, 5, 7, 5), Q11 = c(5, 3, 2, 6, 
    5, 7, 2, 4, 5, 4), Q12 = c(5, 4, 4, 7, 5, 7, 4, 6, 6, 5), 
    Q13 = c(NA, 3, 1, 4, 4, 6, 2, 5, 3, 5), Q14 = c(NA, 3, 1, 
    6, 5, 3, 4, 5, 5, 6), Q15 = c(3, 4, 5, 4, 2, 5, 4, 5, 5, 
    5), Q16 = c(4, 4, 5, 5, 2, 5, 2, 4, 4, 4), Q17 = c(3, 2, 
    4, 2, 5, 4, 4, 5, 6, 3), Q18 = c(5, 4, 7, 3, 5, 4, 2, 6, 
    4, 4), Q19 = c(5, 4, 6, 3, 5, 4, 3, 6, 5, 4), Q20 = c(2, 
    3, NA, 3, NA, 3, 1, 6, 5, NA), Q40 = c(5, 4, 2, 3, 4, 5, 
    4, 5, 5, 3), Q1_mean = c(5.18918918918919, 5.18918918918919, 
    5.18918918918919, 5.18918918918919, 5.18918918918919, 5.18918918918919, 
    5.18918918918919, 5.18918918918919, 5.18918918918919, 5.18918918918919
    )), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

StatSteph · September 13, 2021, 3:14pm

I think you're looking to 1) group by position and then 2) find the mean of Q1. Some problems with your current code is that the select function selects columns and I think you're trying to select a column and then subset rows with that statement but you can't subset to just those with Position as manager using select. You could do that with filter but I don't think we need that.

Secondly, you'll want to use summarise and not mutate to get the means of the groups.

Here's a solution below:

library(tidyverse)
#> Warning: package 'readr' was built under R version 4.1.1

mydata <- structure(list(Unit = c(2, 10, 10, 4, 2, 10, 5, 3, 9, 5), Position = structure(c(2L, 
                                                                                 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L), .Label = c("manager", "employee"
                                                                                 ), class = "factor"), `Work experience` = c(2, 3, 3, 4, 4, 4, 
                                                                                                                             4, 4, 4, 4), `Work experience at current job` = c(2, 2, 3, 2, 
                                                                                                                                                                               4, 1, 4, 4, 4, 4), Q1 = c(5, 4, 3, 6, 4, 5, 2, 4, 5, 6), Q2 = c(5, 
                                                                                                                                                                                                                                               4, 2, 7, 5, 6, 5, 6, 7, 5), Q3 = c(4, 4, 7, 3, 6, 5, 6, 6, 7, 
                                                                                                                                                                                                                                                                                  5), Q4 = c(5, 5, 2, 5, 4, 5, 2, 5, 5, 5), Q5 = c(NA, 4, 3, 4, 
                                                                                                                                                                                                                                                                                                                                   3, 6, NA, 5, NA, NA), Q6 = c(NA, 4, 4, 4, 5, 4, 2, 5, 5, 5), 
               Q7 = c(NA, 4, 4, 4, 5, 5, 4, 5, 5, 5), Q8 = c(6, 4, 2, 3, 
                                                             4, 4, 3, 5, 5, 5), Q9 = c(6, 5, 5, 7, 5, 7, 4, 6, 5, 6), 
               Q10 = c(5, 3, 2, 6, 3, 7, 1, 5, 7, 5), Q11 = c(5, 3, 2, 6, 
                                                              5, 7, 2, 4, 5, 4), Q12 = c(5, 4, 4, 7, 5, 7, 4, 6, 6, 5), 
               Q13 = c(NA, 3, 1, 4, 4, 6, 2, 5, 3, 5), Q14 = c(NA, 3, 1, 
                                                               6, 5, 3, 4, 5, 5, 6), Q15 = c(3, 4, 5, 4, 2, 5, 4, 5, 5, 
                                                                                             5), Q16 = c(4, 4, 5, 5, 2, 5, 2, 4, 4, 4), Q17 = c(3, 2, 
                                                                                                                                                4, 2, 5, 4, 4, 5, 6, 3), Q18 = c(5, 4, 7, 3, 5, 4, 2, 6, 
                                                                                                                                                                                 4, 4), Q19 = c(5, 4, 6, 3, 5, 4, 3, 6, 5, 4), Q20 = c(2, 
                                                                                                                                                                                                                                       3, NA, 3, NA, 3, 1, 6, 5, NA), Q40 = c(5, 4, 2, 3, 4, 5, 
                                                                                                                                                                                                                                                                              4, 5, 5, 3), Q1_mean = c(5.18918918918919, 5.18918918918919, 
                                                                                                                                                                                                                                                                                                       5.18918918918919, 5.18918918918919, 5.18918918918919, 5.18918918918919, 
                                                                                                                                                                                                                                                                                                       5.18918918918919, 5.18918918918919, 5.18918918918919, 5.18918918918919
                                                                                                                                                                                                                                                                              )), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
                                                                                                                                                                                                                                                                              ))

mydata %>%
   group_by(Position) %>%
   summarise(Q1_mean = mean(Q1, na.rm = TRUE))
#> # A tibble: 2 x 2
#>   Position Q1_mean
#>   <fct>      <dbl>
#> 1 manager     5   
#> 2 employee    4.33

^{Created on 2021-09-13 by the reprex package (v2.0.1)}

system · September 20, 2021, 3:14pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.