Get observations "below average" after grouping

kuttan98 · October 15, 2022, 5:01pm

Hi,
i have a data with students test scores. I have to track the progress they make from test 1 to 2. I group it according to class, test and summarise the average. But I need only those observations which are below average. For example in class A, only those students which are below average in that class must be shown in the new data frame "dat_mod". Similarly for class B. But it gives me an error message. How can I resolve this?


data<-tibble::tribble(
        ~student, ~class, ~test, ~score,
        "Nithin",    "A",    1L,    32L,
        "Gandhi",    "B",    1L,    35L,
         "Nehru",    "A",    1L,    45L,
         "Patel",    "B",    1L,    27L,
        "Rajaji",    "A",    1L,    24L,
          "Azad",    "B",    1L,    36L,
        "Nithin",    "A",    2L,    40L,
        "Gandhi",    "B",    2L,    28L,
         "Nehru",    "A",    2L,    27L,
         "Patel",    "B",    2L,    55L,
        "Rajaji",    "A",    2L,    49L,
          "Azad",    "B",    2L,    56L
        )



dat_mod<-data %>%
  group_by(class,test) %>%
  summarise(avg=mean(score)) %>% 
  filter(avg<mean(score))
#> Error in data %>% group_by(class, test) %>% summarise(avg = mean(score)) %>% : could not find function "%>%"

<sup>Created on 2022-10-15 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>

FJCC · October 15, 2022, 5:40pm

After you summarize() the data, the data frame only has the columns class, test and avg. There is no score column against which to filter. Do you mean to filter the original data?

library(dplyr)    

  data<-tibble::tribble(
    ~student, ~class, ~test, ~score,
    "Nithin",    "A",    1L,    32L,
    "Gandhi",    "B",    1L,    35L,
    "Nehru",    "A",    1L,    45L,
    "Patel",    "B",    1L,    27L,
    "Rajaji",    "A",    1L,    24L,
    "Azad",    "B",    1L,    36L,
    "Nithin",    "A",    2L,    40L,
    "Gandhi",    "B",    2L,    28L,
    "Nehru",    "A",    2L,    27L,
    "Patel",    "B",    2L,    55L,
    "Rajaji",    "A",    2L,    49L,
    "Azad",    "B",    2L,    56L
  )
  
  
  
  dat_mod<-data %>%
    group_by(class,test) %>%
    summarise(avg=mean(score))
#> `summarise()` has grouped output by 'class'. You can override using the
#> `.groups` argument.
  dat_mod
#> # A tibble: 4 × 3
#> # Groups:   class [2]
#>   class  test   avg
#>   <chr> <int> <dbl>
#> 1 A         1  33.7
#> 2 A         2  38.7
#> 3 B         1  32.7
#> 4 B         2  46.3
  
Filtered <- inner_join(data,dat_mod,by=c("class","test")) |> 
    filter(score<avg) 
Filtered
#> # A tibble: 5 × 5
#>   student class  test score   avg
#>   <chr>   <chr> <int> <int> <dbl>
#> 1 Nithin  A         1    32  33.7
#> 2 Patel   B         1    27  32.7
#> 3 Rajaji  A         1    24  33.7
#> 4 Gandhi  B         2    28  46.3
#> 5 Nehru   A         2    27  38.7

^{Created on 2022-10-15 with reprex v2.0.2}

kuttan98 · October 15, 2022, 5:52pm

I want it in such a way that students and the averages across the tests can be tracked. It would appear as what you have shown in "Filtered" data frame. But can I do it without the joins. Like can I play with the original data itself so that I don't have to use joins?

FJCC · October 15, 2022, 6:04pm

You can use mutate() instead of summarize().

  dat_mod<-data %>%
     group_by(class,test) %>%
     mutate(avg=mean(score)) |> 
     filter(score<avg)
  dat_mod
# A tibble: 5 × 5
# Groups:   class, test [4]
  student class  test score   avg
  <chr>   <chr> <int> <int> <dbl>
1 Nithin  A         1    32  33.7
2 Patel   B         1    27  32.7
3 Rajaji  A         1    24  33.7
4 Gandhi  B         2    28  46.3
5 Nehru   A         2    27  38.7

system · November 5, 2022, 6:05pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.