Help ranking variables by value

Longshot408 · September 14, 2021, 8:47pm

Hi all, I have several variables in a data set that I would like to assign ranks to, from highest to lowest, by row.

Here's a snippet of the data:

tibble::tribble(
  ~barrierf_housing, ~barrierf_job, ~barrierf_family, ~barrierf_services, ~barrierf_education, ~barrierf_identification,
                99L,           62L,               2L,                 5L,                  0L,                       0L,
                51L,           10L,               NA,               100L,                  NA,                       NA,
                27L,          100L,              56L,                55L,                  NA,                       0L,
                50L,          100L,              50L,                50L,                 53L,                      23L,
                 0L,            0L,              50L,                 0L,                  0L,                       0L,
               100L,           82L,             100L,                80L,                 60L,                      34L,
                 0L,            6L,               0L,                63L,                 52L,                       0L,
                 0L,            0L,              15L,                 0L,                 26L,                       0L,
               100L,           41L,               0L,               100L,                  NA,                     100L
  )

Each of these rows is a participant. I'd like to be able to add a variable or variables that allows me to see how each participant ranked each choice, so I can ultimately see, (e.g., how many people ranked barrierf_job) as the highest.

FJCC · September 14, 2021, 9:41pm

Here is how I would figure out which choice or choices received the highest score in each row.

DF <- tibble::tribble(
  ~barrierf_housing, ~barrierf_job, ~barrierf_family, ~barrierf_services, ~barrierf_education, ~barrierf_identification,
  99L,           62L,               2L,                 5L,                  0L,                       0L,
  51L,           10L,               NA,               100L,                  NA,                       NA,
  27L,          100L,              56L,                55L,                  NA,                       0L,
  50L,          100L,              50L,                50L,                 53L,                      23L,
  0L,            0L,              50L,                 0L,                  0L,                       0L,
  100L,           82L,             100L,                80L,                 60L,                      34L,
  0L,            6L,               0L,                63L,                 52L,                       0L,
  0L,            0L,              15L,                 0L,                 26L,                       0L,
  100L,           41L,               0L,               100L,                  NA,                     100L
)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
DF_long <- DF %>% mutate(ID = row_number()) %>% 
  pivot_longer(-ID, names_to = "Choice", values_to = "Score")
MAXES <- DF_long %>% group_by(ID) %>% summarize(Max = max(Score, na.rm = TRUE))
#> `summarise()` ungrouping output (override with `.groups` argument)
MAXES
#> # A tibble: 9 x 2
#>      ID   Max
#>   <int> <int>
#> 1     1    99
#> 2     2   100
#> 3     3   100
#> 4     4   100
#> 5     5    50
#> 6     6   100
#> 7     7    63
#> 8     8    26
#> 9     9   100
DF_Filtered <- semi_join(DF_long, MAXES, by = c("ID", Score = "Max"))
DF_Filtered
#> # A tibble: 12 x 3
#>       ID Choice                  Score
#>    <int> <chr>                   <int>
#>  1     1 barrierf_housing           99
#>  2     2 barrierf_services         100
#>  3     3 barrierf_job              100
#>  4     4 barrierf_job              100
#>  5     5 barrierf_family            50
#>  6     6 barrierf_housing          100
#>  7     6 barrierf_family           100
#>  8     7 barrierf_services          63
#>  9     8 barrierf_education         26
#> 10     9 barrierf_housing          100
#> 11     9 barrierf_services         100
#> 12     9 barrierf_identification   100

^{Created on 2021-09-14 by the reprex package (v0.3.0)}

Longshot408 · September 14, 2021, 11:22pm

So based on your code I was able to come up with a similar approach, though assigning a rank to each:

y= tibble::tribble(
  ~barrierf_housing, ~barrierf_job, ~barrierf_family, ~barrierf_services, ~barrierf_education, ~barrierf_identification,
                99L,           62L,               2L,                 5L,                  0L,                       0L,
                51L,           10L,               NA,               100L,                  NA,                       NA,
                27L,          100L,              56L,                55L,                  NA,                       0L,
                50L,          100L,              50L,                50L,                 53L,                      23L,
                 0L,            0L,              50L,                 0L,                  0L,                       0L,
               100L,           82L,             100L,                80L,                 60L,                      34L,
                 0L,            6L,               0L,                63L,                 52L,                       0L,
                 0L,            0L,              15L,                 0L,                 26L,                       0L,
               100L,           41L,               0L,               100L,                  NA,                     100L
  )

y=y %>% mutate(id=c(1:112))

y=y %>% 
  pivot_longer(
    cols= starts_with("barrier"),
    names_to = "variable",
    values_to = "participant_score") %>% 
  arrange(id,participant_score) %>% 
  group_by(id) %>% 
  mutate(rank=rank(participant_score, ties.method = "min", na.last = TRUE))

y

This gives a pretty close output to what I'm looking for; by adding a few more lines like this...

y %>%  
  filter(rank==1) %>% 
  janitor::tabyl(variable) %>% janitor::adorn_pct_formatting()

...I can see how many people ranked each variable as most important, etc.

However, the ranks are backwards. I'd like the highest score to be ranked as 1 rather than 6

EDIT: I was able to flip the scores around by adding:

y$test=car::recode(y$rank,"1 = 6 ; 2 = 5 ; 3 = 4 ; 4 = 3 ; 5 = 2 ; 6 = 1")

Thanks for the advice!

system · September 21, 2021, 11:22pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.