Finding average based on a particulate column value.

Hi,

I have a data frame as example below.

I need to add a column "Average", which is an average of all items (I1, I2, I3,..) for a student (S.ID) in the location the student belong to.

Can someone help me?

Dataframe:
S.ID Location I1 I2 I3
1 1 1 0 1
2 1 0 1 0
3 1 1 0 NA
4 2 0 1 1
5 2 1 1 1
6 2 0 0 1
7 3 1 1 0
8 3 0 0 1
9 4 1 0 1
10 4 0 1 0
.
.
.
n 1 1 1

Hello,

I am not entirely sure what you expect to have as output so I am going to answer your post in two parts.

If you simply want to get the average for all students you can do it with the mutate as specified below which you will see I did. If you want to take those values and get the average of those averages for location you can perform the group_by on location and then proceed to create a new variable locatio_average in summarise. You will see I specified na.rm =TRUE as you had an NA. It finally gives us the average for students in a location. Let me know if this was the solution you wanted?

library(tidyverse)

df <- 
data.frame(
        S.ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    Location = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4),
          I1 = c(1, 0, 1, 0, 1, 0, 1, 0, 1, 0),
          I2 = c(0, 1, 0, 1, 1, 0, 1, 0, 0, 1),
          I3 = c(1, 0, NA, 1, 1, 1, 0, 1, 1, 0)
)

df %>% mutate(average = (I1+I2+I3)/3) %>% 
  group_by(Location) %>% 
  summarise(location_average = mean(average, na.rm = TRUE))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 4 x 2
#>   Location location_average
#>      <dbl>            <dbl>
#> 1        1            0.5  
#> 2        2            0.667
#> 3        3            0.5  
#> 4        4            0.5

Created on 2020-10-22 by the reprex package (v0.3.0)

Thank you for your reply.
my output expectation is as follows:
S.ID Location I1 I2 I3 Average
1 1 1 0 1 0.5
2 1 0 1 0 0.5
3 1 1 0 NA 0.5
4 2 0 1 1 0.667
5 2 1 1 1 0.667
6 2 0 0 1 0.667
7 3 1 1 0 0.5
8 3 0 0 1 0.5
9 4 1 0 1 0.5
10 4 0 1 0 0.5

In this case, I'd probably propose the following. I don't see how your average values make sense exactly as you have 1,0,1 for the first person which is 2/3 so needs to be 0.6? As you will see the below will work on your NA case too.

library(tidyverse)

df <- 
data.frame(
        S.ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    Location = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4),
          I1 = c(1, 0, 1, 0, 1, 0, 1, 0, 1, 0),
          I2 = c(0, 1, 0, 1, 1, 0, 1, 0, 0, 1),
          I3 = c(1, 0, NA, 1, 1, 1, 0, 1, 1, 0)
)

df_selected <- df %>% select(I1:I3) 


average <- apply(X=df_selected, MARGIN=1, FUN=mean, na.rm=TRUE)


df_final <- cbind(df,average)

df_final
#>    S.ID Location I1 I2 I3   average
#> 1     1        1  1  0  1 0.6666667
#> 2     2        1  0  1  0 0.3333333
#> 3     3        1  1  0 NA 0.5000000
#> 4     4        2  0  1  1 0.6666667
#> 5     5        2  1  1  1 1.0000000
#> 6     6        2  0  0  1 0.3333333
#> 7     7        3  1  1  0 0.6666667
#> 8     8        3  0  0  1 0.3333333
#> 9     9        4  1  0  1 0.6666667
#> 10   10        4  0  1  0 0.3333333

Created on 2020-10-22 by the reprex package (v0.3.0)

Thanks again, I am looking for something combination of your 1st and 2nd reply.
Find the below image, how the result should be.

Hope it is clear now. Sorry for not being so clear.

Okay it is slightly more clear. Let me know if this is what you want?

library(tidyverse)

df <- 
  data.frame(
    S.ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    Location = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4),
    I1 = c(1, 0, 1, 0, 1, 0, 1, 0, 1, 0),
    I2 = c(0, 1, 0, 1, 1, 0, 1, 0, 0, 1),
    I3 = c(1, 0, NA, 1, 1, 1, 0, 1, 1, 0)
  )

df_average <- df %>% mutate(average = (I1+I2+I3)/3) %>% 
  group_by(Location) %>% 
  summarise(location_average = mean(average, na.rm = TRUE))
#> `summarise()` ungrouping output (override with `.groups` argument)


df_final <- left_join(df,df_average)
#> Joining, by = "Location"

df_final
#>    S.ID Location I1 I2 I3 location_average
#> 1     1        1  1  0  1        0.5000000
#> 2     2        1  0  1  0        0.5000000
#> 3     3        1  1  0 NA        0.5000000
#> 4     4        2  0  1  1        0.6666667
#> 5     5        2  1  1  1        0.6666667
#> 6     6        2  0  0  1        0.6666667
#> 7     7        3  1  1  0        0.5000000
#> 8     8        3  0  0  1        0.5000000
#> 9     9        4  1  0  1        0.5000000
#> 10   10        4  0  1  0        0.5000000

Created on 2020-10-22 by the reprex package (v0.3.0)

1 Like

Yes, this is exactly i wanted :slight_smile:
Thank you very much.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.