How to mean a variable with condition in another variable (statement)

Hello, I'm a student at the university, I'm trying to learn to code all by myself. I need some help for an exercice.

For exemple the first variable "University 1" is equal to 1 in line 1, 9 and 10 and I want to mean the productivity on line 1, 9 and 10 and put this result in a table.
I would like to do this for each university variables. In my data base i have more than 50 university and 1000 observations, i have put only 10 university and 20 observations for the explanation

Thank you so much !!

It would be better to provide code to generate a suitable table than to show a literal table, as that is very inconvenient to recreate in an R session. Below I create randomly a small table consisting of only 1 and 0 values.

# Randomly generated selection columns
DF <- as.data.frame(matrix(sample(c(0,1), 30, replace=TRUE), ncol=3))
names(DF) <- c('uni1', 'uni2', 'uni3')
# Randomly generated info column
DF[["prod"]] <- sample(seq(11,999), 10)
##    uni1 uni2 uni3 prod
## 1     0    1    1  935
## 2     0    0    1  116
## 3     0    1    1  707
## 4     0    0    1  434
## 5     1    0    0  537
## 6     0    0    0  443
## 7     0    0    1  525
## 8     1    0    1  979
## 9     0    0    1  360
## 10    0    0    1  975

Now to the main question. You can filter table rows like this: DF[(DF$uni1 == 1), ], and you can repeat the process for every uni by using the apply() family of functions. I chose 'vapply()' as the return of each iteration is a single value.

Here is how these three go together:

# Repeated for all columsn except the last
res <- vapply(names(DF)[1:length(DF)-1], 
              function(x) { 
                mean(DF[(DF[[x]] == 1), "prod"])
              },
              numeric(1) )
## uni1        uni2         uni3 
## 758.000  821.000  628.875 

The result is a named vector. You can then assign it to a column of a table.

1 Like

Hi @anatolia
You can do this.

library(dplyr, warn.conflicts = FALSE)
library(tidyr)

df <- tibble::tibble(University1 = c(1,0,0,1), University2 = c(1,1,0,0), University3 = c(1,1,0,1) ,Productivity = c(12,13,5,2) )


df
#> # A tibble: 4 x 4
#>   University1 University2 University3 Productivity
#>         <dbl>       <dbl>       <dbl>        <dbl>
#> 1           1           1           1           12
#> 2           0           1           1           13
#> 3           0           0           0            5
#> 4           1           0           1            2

df_gather <- tidyr::gather(data = df, University , value, -Productivity )


res <- 
  df_gather %>% 
  filter(value == 1) %>% 
  group_by(University) %>% 
  summarise(prod_univ = mean(Productivity, na.rm = T))


res
#> # A tibble: 3 x 2
#>   University  prod_univ
#>   <chr>           <dbl>
#> 1 University1       7  
#> 2 University2      12.5
#> 3 University3       9

Created on 2019-11-27 by the reprex package (v0.2.1)

For better help post a reprex example insead of an image of your dataframe:

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.