How to average a subset of data based on other columns

#1

Hi! I have a question about how to subset data to then achieve an average of a specific group of numbers.
For instance:

block<-c('T1', 'T1', 'T1', 'T1','T2', 'T2', 'T2', 'T2')
plot<-c('1', '1', '1', '1','1', '1', '1', '1')
subplot<-c('NO', 'NI', 'EO', 'EI','NO', 'NI', 'EO', 'EI')
data1<-c(0,2,4,2,3,5,2,4)
data2<-c(1,1,1,7,2,2,3,1)

orig.DF<-data.frame(block, plot, subplot, data1, data2)

So I have block, plot, and subplot as my three levels. I want to have a single average for each plot. So I need to combine the data for each subplot to have a single average value for data1 and data2 that represent a block and plot.
I want a dataframe that looks like this:
block plot data1 data2
1 1 2 2.5
2 1 3.5 2

I can probably use the subset or aggregate functions but I'm not sure how to syntax this!
Please let me know if you need any additional information! Thank you!

0 Likes

#2

You can do it with the dplyr package

orig.DF <- data.frame(stringsAsFactors = FALSE,
                      data1 = c(0, 2, 4, 2, 3, 5, 2, 4),
                      data2 = c(1, 1, 1, 7, 2, 2, 3, 1),
                      block = as.factor(c("T1", "T1", "T1", "T1", "T2", "T2", "T2", "T2")),
                      plot = as.factor(c("1", "1", "1", "1", "1", "1", "1", "1")),
                      subplot = as.factor(c("NO", "NI", "EO", "EI", "NO", "NI", "EO", "EI"))
)

library(dplyr)

orig.DF %>% 
    group_by(block, plot) %>% 
    summarise(mean_data1 = mean(data1), mean_data2 = mean(data2))
#> # A tibble: 2 x 4
#> # Groups:   block [2]
#>   block plot  mean_data1 mean_data2
#>   <fct> <fct>      <dbl>      <dbl>
#> 1 T1    1            2          2.5
#> 2 T2    1            3.5        2

Created on 2019-03-06 by the reprex package (v0.2.1)

You can learn how to use dplyr and the rest of the tidyverse by reading this free ebook

1 Like

#3

Thank you! That's exactly what I want. What if instead of just 'data1' and 'data2' I have like 30 different columns, is there a shorter way to do that? Like specifying column (4:34) or something?

0 Likes

#4

You can use summarise_if() or summarise_at(), this is an example that applies mean() function to all the numeric variables.

orig.DF %>% 
    group_by(block, plot) %>% 
    summarise_if(is.numeric, mean)
#> # A tibble: 2 x 4
#> # Groups:   block [2]
#>   block plot  data1 data2
#>   <fct> <fct> <dbl> <dbl>
#> 1 T1    1       2     2.5
#> 2 T2    1       3.5   2
0 Likes

#5

Perfect! Worked like a charm. Thank you so much.

Eventually what I need to do is also average groups of columns. So say I have data1:data20. And then doing the same thing as above, I then need have average 1:4, 5:8, 9:12, and so on. Any ideas?

0 Likes

#6

You can also perform rowwise operations but this is not the preferred method becuse it's "untidy".

orig.DF %>% 
    group_by(block, plot) %>% 
    summarise_if(is.numeric, mean) %>%
    rowwise() %>% 
    mutate(mean_1_2 = mean(c(data1,data2)))
#> Source: local data frame [2 x 5]
#> Groups: <by row>
#> 
#> # A tibble: 2 x 5
#>   block plot  data1 data2 mean_1_2
#>   <fct> <fct> <dbl> <dbl>    <dbl>
#> 1 T1    1       2     2.5     2.25
#> 2 T2    1       3.5   2       2.75

I recommend you to open a new topic about this but providing a relevant REPRoducible EXample (reprex), that way we can better understand the structure of your dataset and provide more suitable help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

0 Likes

closed #7

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

0 Likes