Finding average based on a particulate column value.

anbananand · October 22, 2020, 6:43pm

Hi,

I have a data frame as example below.

I need to add a column "Average", which is an average of all items (I1, I2, I3,..) for a student (S.ID) in the location the student belong to.

Can someone help me?

Dataframe:
S.ID Location I1 I2 I3
1 1 1 0 1
2 1 0 1 0
3 1 1 0 NA
4 2 0 1 1
5 2 1 1 1
6 2 0 0 1
7 3 1 1 0
8 3 0 0 1
9 4 1 0 1
10 4 0 1 0
.
.
.
n 1 1 1

GreyMerchant · October 22, 2020, 7:06pm

Hello,

I am not entirely sure what you expect to have as output so I am going to answer your post in two parts.

If you simply want to get the average for all students you can do it with the mutate as specified below which you will see I did. If you want to take those values and get the average of those averages for location you can perform the group_by on location and then proceed to create a new variable locatio_average in summarise. You will see I specified na.rm =TRUE as you had an NA. It finally gives us the average for students in a location. Let me know if this was the solution you wanted?

library(tidyverse)

df <- 
data.frame(
        S.ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    Location = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4),
          I1 = c(1, 0, 1, 0, 1, 0, 1, 0, 1, 0),
          I2 = c(0, 1, 0, 1, 1, 0, 1, 0, 0, 1),
          I3 = c(1, 0, NA, 1, 1, 1, 0, 1, 1, 0)
)

df %>% mutate(average = (I1+I2+I3)/3) %>% 
  group_by(Location) %>% 
  summarise(location_average = mean(average, na.rm = TRUE))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 4 x 2
#>   Location location_average
#>      <dbl>            <dbl>
#> 1        1            0.5  
#> 2        2            0.667
#> 3        3            0.5  
#> 4        4            0.5

^{Created on 2020-10-22 by the reprex package (v0.3.0)}

anbananand · October 22, 2020, 7:25pm

Thank you for your reply.
my output expectation is as follows:
S.ID Location I1 I2 I3 Average
1 1 1 0 1 0.5
2 1 0 1 0 0.5
3 1 1 0 NA 0.5
4 2 0 1 1 0.667
5 2 1 1 1 0.667
6 2 0 0 1 0.667
7 3 1 1 0 0.5
8 3 0 0 1 0.5
9 4 1 0 1 0.5
10 4 0 1 0 0.5

GreyMerchant · October 22, 2020, 7:35pm

In this case, I'd probably propose the following. I don't see how your average values make sense exactly as you have 1,0,1 for the first person which is 2/3 so needs to be 0.6? As you will see the below will work on your NA case too.

library(tidyverse)

df <- 
data.frame(
        S.ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    Location = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4),
          I1 = c(1, 0, 1, 0, 1, 0, 1, 0, 1, 0),
          I2 = c(0, 1, 0, 1, 1, 0, 1, 0, 0, 1),
          I3 = c(1, 0, NA, 1, 1, 1, 0, 1, 1, 0)
)

df_selected <- df %>% select(I1:I3) 


average <- apply(X=df_selected, MARGIN=1, FUN=mean, na.rm=TRUE)


df_final <- cbind(df,average)

df_final
#>    S.ID Location I1 I2 I3   average
#> 1     1        1  1  0  1 0.6666667
#> 2     2        1  0  1  0 0.3333333
#> 3     3        1  1  0 NA 0.5000000
#> 4     4        2  0  1  1 0.6666667
#> 5     5        2  1  1  1 1.0000000
#> 6     6        2  0  0  1 0.3333333
#> 7     7        3  1  1  0 0.6666667
#> 8     8        3  0  0  1 0.3333333
#> 9     9        4  1  0  1 0.6666667
#> 10   10        4  0  1  0 0.3333333

^{Created on 2020-10-22 by the reprex package (v0.3.0)}

anbananand · October 22, 2020, 8:00pm

Thanks again, I am looking for something combination of your 1st and 2nd reply.
Find the below image, how the result should be.

Hope it is clear now. Sorry for not being so clear.

GreyMerchant · October 22, 2020, 8:08pm

Okay it is slightly more clear. Let me know if this is what you want?

library(tidyverse)

df <- 
  data.frame(
    S.ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    Location = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4),
    I1 = c(1, 0, 1, 0, 1, 0, 1, 0, 1, 0),
    I2 = c(0, 1, 0, 1, 1, 0, 1, 0, 0, 1),
    I3 = c(1, 0, NA, 1, 1, 1, 0, 1, 1, 0)
  )

df_average <- df %>% mutate(average = (I1+I2+I3)/3) %>% 
  group_by(Location) %>% 
  summarise(location_average = mean(average, na.rm = TRUE))
#> `summarise()` ungrouping output (override with `.groups` argument)


df_final <- left_join(df,df_average)
#> Joining, by = "Location"

df_final
#>    S.ID Location I1 I2 I3 location_average
#> 1     1        1  1  0  1        0.5000000
#> 2     2        1  0  1  0        0.5000000
#> 3     3        1  1  0 NA        0.5000000
#> 4     4        2  0  1  1        0.6666667
#> 5     5        2  1  1  1        0.6666667
#> 6     6        2  0  0  1        0.6666667
#> 7     7        3  1  1  0        0.5000000
#> 8     8        3  0  0  1        0.5000000
#> 9     9        4  1  0  1        0.5000000
#> 10   10        4  0  1  0        0.5000000

^{Created on 2020-10-22 by the reprex package (v0.3.0)}

anbananand · October 22, 2020, 8:49pm

Yes, this is exactly i wanted
Thank you very much.

system · October 29, 2020, 8:49pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.