summarise(max) but keep all columns

I am a total beginner, and struggling to understand how to format the code to do what I want. I want to remove the lower test score (grouped by student_id and test_name) but I want to keep all of the other variables that I don't need to group by. I can't figure out how to do this. It goes from 21 columns to 3 columns.

Thanks for any help!

You probably want to use the combination of group_by() and mutate(). This will compute the summary score (max value, for example) but not collapse the data.

For example:

library(dplyr)

iris %>% 
  group_by(Species) %>% 
  mutate(max_score = max(Sepal.Length)) %>% 
  ungroup()
#> # A tibble: 150 x 6
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species max_score
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>       <dbl>
#>  1          5.1         3.5          1.4         0.2 setosa        5.8
#>  2          4.9         3            1.4         0.2 setosa        5.8
#>  3          4.7         3.2          1.3         0.2 setosa        5.8
#>  4          4.6         3.1          1.5         0.2 setosa        5.8
#>  5          5           3.6          1.4         0.2 setosa        5.8
#>  6          5.4         3.9          1.7         0.4 setosa        5.8
#>  7          4.6         3.4          1.4         0.3 setosa        5.8
#>  8          5           3.4          1.5         0.2 setosa        5.8
#>  9          4.4         2.9          1.4         0.2 setosa        5.8
#> 10          4.9         3.1          1.5         0.1 setosa        5.8
#> # … with 140 more rows

Created on 2020-02-11 by the reprex package (v0.3.0)

4 Likes

Thank you! I then used distinct to select only the highest score. I am quite sure that I have sixteen lines of code when three would have worked. Sigh. Work in progress!

You may want to use filter() instead (if you're trying to keep the highest score, per student). For example:

library(dplyr)

iris %>% 
  group_by(Species) %>% 
  mutate(max_score = max(Sepal.Length)) %>% 
  ungroup() %>% 
  filter(Sepal.Length==max_score)
#> # A tibble: 3 x 6
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species    max_score
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>          <dbl>
#> 1          5.8         4            1.2         0.2 setosa           5.8
#> 2          7           3.2          4.7         1.4 versicolor       7  
#> 3          7.9         3.8          6.4         2   virginica        7.9
5 Likes

Ah. Beautiful. This is so much more efficient than my current code!

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.