 # How to mean a variable with condition in another variable (statement)

Hello, I'm a student at the university, I'm trying to learn to code all by myself. I need some help for an exercice.

For exemple the first variable "University 1" is equal to 1 in line 1, 9 and 10 and I want to mean the productivity on line 1, 9 and 10 and put this result in a table.
I would like to do this for each university variables. In my data base i have more than 50 university and 1000 observations, i have put only 10 university and 20 observations for the explanation

Thank you so much !!

It would be better to provide code to generate a suitable table than to show a literal table, as that is very inconvenient to recreate in an R session. Below I create randomly a small table consisting of only 1 and 0 values.

``````# Randomly generated selection columns
DF <- as.data.frame(matrix(sample(c(0,1), 30, replace=TRUE), ncol=3))
names(DF) <- c('uni1', 'uni2', 'uni3')
# Randomly generated info column
DF[["prod"]] <- sample(seq(11,999), 10)
``````
``````##    uni1 uni2 uni3 prod
## 1     0    1    1  935
## 2     0    0    1  116
## 3     0    1    1  707
## 4     0    0    1  434
## 5     1    0    0  537
## 6     0    0    0  443
## 7     0    0    1  525
## 8     1    0    1  979
## 9     0    0    1  360
## 10    0    0    1  975
``````

Now to the main question. You can filter table rows like this: `DF[(DF\$uni1 == 1), ]`, and you can repeat the process for every uni by using the `apply()` family of functions. I chose 'vapply()' as the return of each iteration is a single value.

Here is how these three go together:

``````# Repeated for all columsn except the last
res <- vapply(names(DF)[1:length(DF)-1],
function(x) {
mean(DF[(DF[[x]] == 1), "prod"])
},
numeric(1) )
``````
``````## uni1        uni2         uni3
## 758.000  821.000  628.875
``````

The result is a named vector. You can then assign it to a column of a table.

1 Like

Hi @anatolia
You can do this.

``````library(dplyr, warn.conflicts = FALSE)
library(tidyr)

df <- tibble::tibble(University1 = c(1,0,0,1), University2 = c(1,1,0,0), University3 = c(1,1,0,1) ,Productivity = c(12,13,5,2) )

df
#> # A tibble: 4 x 4
#>   University1 University2 University3 Productivity
#>         <dbl>       <dbl>       <dbl>        <dbl>
#> 1           1           1           1           12
#> 2           0           1           1           13
#> 3           0           0           0            5
#> 4           1           0           1            2

df_gather <- tidyr::gather(data = df, University , value, -Productivity )

res <-
df_gather %>%
filter(value == 1) %>%
group_by(University) %>%
summarise(prod_univ = mean(Productivity, na.rm = T))

res
#> # A tibble: 3 x 2
#>   University  prod_univ
#>   <chr>           <dbl>
#> 1 University1       7
#> 2 University2      12.5
#> 3 University3       9
``````

Created on 2019-11-27 by the reprex package (v0.2.1)

For better help post a reprex example insead of an image of your dataframe:

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.