help ranking variables

Hi all, I need to create a new variable that ranks another variable, by value. Essentially, we had participants in a recent study rate how hard they found 6 different things to be, and we now want to have a rank-order of these things for each person. I've been messing with this code forever though and can't get it right.

So far I've been using this method:

dat=tibble::tribble(~name, ~score,
                "bob", 0,
                "bob", 5,
                "bob", 50,
                "bob", 50,
                "bob", 50,
                "bob", NA)

dat=dat %>% mutate(rank=rank(score, 
                         ties.method = "max",  na.last = FALSE))

# Flip the ranks around so they are highest to lowest
dat$rank=car::recode(dat$rank,"1 = 6 ; 2 = 5 ; 3 = 4 ; 4 = 3 ; 5 = 2 ; 6 = 1")

dat

The problem with this is that it assigns ranks like you see in sporting events; that is, if three people tied for second place, the rankings look like 1,2,2,2,5. This is not what I want...I need it to be 1,2,2,2,3. The current way I'm getting the ranks makes it look like there are several things missing in the data that aren't ranked when in reality there is only one NA.

library(tidyverse)
dat <- tibble::tribble(~name, ~score,
                    "bob", 0,
                    "bob", 5,
                    "bob", 50,
                    "bob", 50,
                    "bob", 50,
                    "bob", NA)

(dat <- dat %>% mutate(rank1=min_rank(score),
                       rank2 = max(rank1,na.rm = TRUE) + 1 - rank1,
                       rank3 = as_factor(rank2)))
2 Likes

The group id number is the rank value for your needs. I added another participant to make sure this would work with your data set. The highest score number is ranked = 1. If that is incorrect, drop the desc() in the rank function. I was also unsure what you want to with NAs. They appear with the lowest rank (highest ranking value).

library(tidyverse)

dat=tibble::tribble(~name, ~score,
                    "bob", 0,
                    "bob", 5,
                    "bob", 50,
                    "bob", 50,
                    "bob", 60,
                    "bob", NA,
                    "sue", 0,
                    "sue", 25,
                    "sue", 50,
                    "sue", 50,
                    "sue", 60,
                    "sue", 25)


datr <- dat %>% group_by(name) %>% 
  mutate(ranked = rank(desc(score), ties.method = "max", na.last = TRUE)) %>% 
  arrange(name, ranked)

datr %>% group_modify(~ .x %>% group_by(ranked) %>% mutate(id = cur_group_id()))
#> # A tibble: 12 × 4
#> # Groups:   name [2]
#>    name  score ranked    id
#>    <chr> <dbl>  <int> <int>
#>  1 bob      60      1     1
#>  2 bob      50      3     2
#>  3 bob      50      3     2
#>  4 bob       5      4     3
#>  5 bob       0      5     4
#>  6 bob      NA      6     5
#>  7 sue      60      1     1
#>  8 sue      50      3     2
#>  9 sue      50      3     2
#> 10 sue      25      5     3
#> 11 sue      25      5     3
#> 12 sue       0      6     4

Created on 2021-10-09 by the reprex package (v2.0.1)

Turns out a slight variation on this was all I needed! dense_rank() was the key!!! Combined with the second and third lines you gave me this solved everything, thanks!

Oh, good grief! There was a simple solution after all. Still, I had fun and learned about group id. You actually need just one line, reversing the order with desc(score):

library(tidyverse)

dat=tibble::tribble(~name, ~score,
                    "bob", 0,
                    "bob", 5,
                    "bob", 50,
                    "bob", 50,
                    "bob", 50,
                    "bob", NA)

dat %>% mutate(ranked = dense_rank(desc(score)))
#> # A tibble: 6 × 3
#>   name  score ranked
#>   <chr> <dbl>  <int>
#> 1 bob       0      3
#> 2 bob       5      2
#> 3 bob      50      1
#> 4 bob      50      1
#> 5 bob      50      1
#> 6 bob      NA     NA

Created on 2021-10-09 by the reprex package (v2.0.1)

Excellent, thanks!! Yeah I can't believe how many damn hours it took for me to find this solution. I wish the help documentation on the dplyr ranking verbs was more clear or I wouldn't have wasted so much time with base R's version.

Ah well, more fodder for my coding notebook

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.