Collapsing data into single row, choosing highest value

I have a large dataset with multiple instances of 'Player', and I want to collapse their tournament scores into a single row but keep the row with the highest earnings. It's a large data set so I can't exactly paste the whole thing but a small example table looks like this:

> table1 <- matrix(c('John Doe',2100,20,'N/A','N/A','John Doe',1000,'N/A',19,'N/A','John Doe',500,'N/A','N/A',5),ncol=5,byrow=TRUE)
> colnames(table1) <- c("Player","Earned","Tournament1","Tournament2","Tournament3")

I'd like to eliminate the N/A values and the duplicate rows by combining Tournament 1, 2, 3 into a single row, and also keeping the highest earnings value (2100). So far I've used top_n(1, Earned) to keep the row with the highest earnings, but it only keeps the score of Tournament1, and I need to fill in the other columns with their scores.

Hi @riceislife,
Welcome to the RStudio Community Forum.

Here is a simple suggestion:

library(stringr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

table1 <- matrix(c('John Doe',2100,20,'N/A','N/A',
                   'John Doe',1000,'N/A',19,'N/A',
                   'John Doe',500,'N/A','N/A',5),
                 ncol=5,byrow=TRUE)
colnames(table1) <- c("Player","Earned","Tournament1","Tournament2","Tournament3")

df <- data.frame(table1)

df <- df %>%
    as_tibble() %>% 
    mutate(across(2:5, str_replace_all, "N/A", NA_character_)) %>% 
    mutate(across(2:5, as.numeric))

df
#> # A tibble: 3 x 5
#>   Player   Earned Tournament1 Tournament2 Tournament3
#>   <chr>     <dbl>       <dbl>       <dbl>       <dbl>
#> 1 John Doe   2100          20          NA          NA
#> 2 John Doe   1000          NA          19          NA
#> 3 John Doe    500          NA          NA           5

df %>% 
  group_by(Player) %>% 
  summarise(top_earning = max(Earned),
            tour1 = sum(Tournament1, na.rm=TRUE),
            tour2 = sum(Tournament2, na.rm=TRUE),
            tour3 = sum(Tournament3, na.rm=TRUE))
#> # A tibble: 1 x 5
#>   Player   top_earning tour1 tour2 tour3
#>   <chr>          <dbl> <dbl> <dbl> <dbl>
#> 1 John Doe        2100    20    19     5

Created on 2021-04-20 by the reprex package (v2.0.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.