Pick second and third best values

jynusmac · November 29, 2021, 4:23pm

Hi. I want to find the second and third high results in a dataframe. Explication:

In this data frame:

max_vot <- data.frame(
          P1 = c(310L, 349L, 2831L, 99L, 161L, 1353L, 196L, 113L, 23L),
          P2 = c(193L, 111L, 3126L, 56L, 155L, 1261L, 92L, 125L, 47L),
          P3 = c(122L, 147L, 3566L, 54L, 70L, 1007L, 123L, 88L, 22L),
          P4 = c(30L, 42L, 645L, 18L, 10L, 247L, 43L, 15L, 6L),
          P5 = c(47L, 45L, 574L, 41L, 17L, 378L, 39L, 20L, 5L)
)

using

max_vote <- colnames(max_vot)[max.col(max_vot, ties.method = "first")]

I can get the column that have the more high value row:

[1] "P1" "P1" "P3" "P1" "P1" "P1" "P1" "P2" "P2"

Now I want to get the second most high value column in each row, in this example will be: P2, P3, P2, P2, P2, P2, P3, P2, P1, P1

And then the third best value.

But thies.method only allow “random”, “first”, “last” as arguments. Any idea on how to deal with this?

xvalda · November 29, 2021, 5:12pm

Hi @jynusmac ,

I'm using the tidyverse for your problem

Here's your dataframe again:

max_vot <- data.frame(
          P1 = c(310L, 349L, 2831L, 99L, 161L, 1353L, 196L, 113L, 23L),
          P2 = c(193L, 111L, 3126L, 56L, 155L, 1261L, 92L, 125L, 47L),
          P3 = c(122L, 147L, 3566L, 54L, 70L, 1007L, 123L, 88L, 22L),
          P4 = c(30L, 42L, 645L, 18L, 10L, 247L, 43L, 15L, 6L),
          P5 = c(47L, 45L, 574L, 41L, 17L, 378L, 39L, 20L, 5L)
)

Create a table that shows 2nd and 3rd highest values, I don't know the background of your data but used the assumption that Pi are "judges" that give "values" for specific votes (vote_id). If the background of your data is different and I misunderstood it, feel free to rename the objects to make the code more explicit:

max_vot_table <-  max_vot %>% 
  #assigne vote_ID (row number) to each row, useful for when we'll pivot to a longer format below
  mutate(vote_ID = 1:nrow(.)) %>% 
  #pivot to long format, we obtain 3 columns: row ID (or vote ID), judge ID (P1, ... P5), value of the vote
  pivot_longer(-vote_ID, names_to = "judge", values_to = "value") %>% 
  #sort by vote_ID (the rows of your source dataframe) and value of the vote in descending order
  arrange(vote_ID, desc(value)) %>% 
  #group by vote_ID to perform our summary on each of these groups 
  group_by(vote_ID) %>% 
  #select 2nd or 3rd position from each group
  summarise(
    second_pos = judge[2], 
    third_pos = judge[3]
  )

The results are in each column of max_vot_table . If you need to export them separately as a vector:

max_vot_table %>% pull(second_pos)
max_vot_table %>% pull(third_pos)

Let me know if you'd rather have a function for this.

There are probably more code-savvy ways to solve this, so maybe worth waiting to see if someone suggests something a bit crispier

FJCC · November 29, 2021, 5:24pm

I don't claim this is better than @xvalda's solution, it is just different. You can change the mutate() to summarize() to get only the column A.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

max_vot <- data.frame(
  P1 = c(310L, 349L, 2831L, 99L, 161L, 1353L, 196L, 113L, 23L),
  P2 = c(193L, 111L, 3126L, 56L, 155L, 1261L, 92L, 125L, 47L),
  P3 = c(122L, 147L, 3566L, 54L, 70L, 1007L, 123L, 88L, 22L),
  P4 = c(30L, 42L, 645L, 18L, 10L, 247L, 43L, 15L, 6L),
  P5 = c(47L, 45L, 574L, 41L, 17L, 378L, 39L, 20L, 5L)
)
MyFunc <- function(R,Pos){
  tmp <- sort(R,decreasing = TRUE)
  colnames(max_vot)[which(R==tmp[Pos])]
}
max_vot |> rowwise() |> 
  mutate(A = MyFunc(c_across(P1:P5),Pos = 2))
#> # A tibble: 9 x 6
#> # Rowwise: 
#>      P1    P2    P3    P4    P5 A    
#>   <int> <int> <int> <int> <int> <chr>
#> 1   310   193   122    30    47 P2   
#> 2   349   111   147    42    45 P3   
#> 3  2831  3126  3566   645   574 P2   
#> 4    99    56    54    18    41 P2   
#> 5   161   155    70    10    17 P2   
#> 6  1353  1261  1007   247   378 P2   
#> 7   196    92   123    43    39 P3   
#> 8   113   125    88    15    20 P1   
#> 9    23    47    22     6     5 P1

^{Created on 2021-11-29 by the reprex package (v2.0.1)}

xvalda · November 29, 2021, 7:09pm

@FJCC It's definitely shorter code, and self-explanatory.

jynusmac · December 2, 2021, 10:52am

Sorry for delay in reply, the two solutions work perfectly. Thanks you both for helping me.

system · December 9, 2021, 10:53am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.