Get observations "below average" after grouping

Hi,
i have a data with students test scores. I have to track the progress they make from test 1 to 2. I group it according to class, test and summarise the average. But I need only those observations which are below average. For example in class A, only those students which are below average in that class must be shown in the new data frame "dat_mod". Similarly for class B. But it gives me an error message. How can I resolve this?


data<-tibble::tribble(
        ~student, ~class, ~test, ~score,
        "Nithin",    "A",    1L,    32L,
        "Gandhi",    "B",    1L,    35L,
         "Nehru",    "A",    1L,    45L,
         "Patel",    "B",    1L,    27L,
        "Rajaji",    "A",    1L,    24L,
          "Azad",    "B",    1L,    36L,
        "Nithin",    "A",    2L,    40L,
        "Gandhi",    "B",    2L,    28L,
         "Nehru",    "A",    2L,    27L,
         "Patel",    "B",    2L,    55L,
        "Rajaji",    "A",    2L,    49L,
          "Azad",    "B",    2L,    56L
        )



dat_mod<-data %>%
  group_by(class,test) %>%
  summarise(avg=mean(score)) %>% 
  filter(avg<mean(score))
#> Error in data %>% group_by(class, test) %>% summarise(avg = mean(score)) %>% : could not find function "%>%"

<sup>Created on 2022-10-15 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>

After you summarize() the data, the data frame only has the columns class, test and avg. There is no score column against which to filter. Do you mean to filter the original data?

library(dplyr)    

  data<-tibble::tribble(
    ~student, ~class, ~test, ~score,
    "Nithin",    "A",    1L,    32L,
    "Gandhi",    "B",    1L,    35L,
    "Nehru",    "A",    1L,    45L,
    "Patel",    "B",    1L,    27L,
    "Rajaji",    "A",    1L,    24L,
    "Azad",    "B",    1L,    36L,
    "Nithin",    "A",    2L,    40L,
    "Gandhi",    "B",    2L,    28L,
    "Nehru",    "A",    2L,    27L,
    "Patel",    "B",    2L,    55L,
    "Rajaji",    "A",    2L,    49L,
    "Azad",    "B",    2L,    56L
  )
  
  
  
  dat_mod<-data %>%
    group_by(class,test) %>%
    summarise(avg=mean(score))
#> `summarise()` has grouped output by 'class'. You can override using the
#> `.groups` argument.
  dat_mod
#> # A tibble: 4 × 3
#> # Groups:   class [2]
#>   class  test   avg
#>   <chr> <int> <dbl>
#> 1 A         1  33.7
#> 2 A         2  38.7
#> 3 B         1  32.7
#> 4 B         2  46.3
  
Filtered <- inner_join(data,dat_mod,by=c("class","test")) |> 
    filter(score<avg) 
Filtered
#> # A tibble: 5 × 5
#>   student class  test score   avg
#>   <chr>   <chr> <int> <int> <dbl>
#> 1 Nithin  A         1    32  33.7
#> 2 Patel   B         1    27  32.7
#> 3 Rajaji  A         1    24  33.7
#> 4 Gandhi  B         2    28  46.3
#> 5 Nehru   A         2    27  38.7

Created on 2022-10-15 with reprex v2.0.2

I want it in such a way that students and the averages across the tests can be tracked. It would appear as what you have shown in "Filtered" data frame. But can I do it without the joins. Like can I play with the original data itself so that I don't have to use joins?

You can use mutate() instead of summarize().

  dat_mod<-data %>%
     group_by(class,test) %>%
     mutate(avg=mean(score)) |> 
     filter(score<avg)
  dat_mod
# A tibble: 5 × 5
# Groups:   class, test [4]
  student class  test score   avg
  <chr>   <chr> <int> <int> <dbl>
1 Nithin  A         1    32  33.7
2 Patel   B         1    27  32.7
3 Rajaji  A         1    24  33.7
4 Gandhi  B         2    28  46.3
5 Nehru   A         2    27  38.7

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.