How to select the highest value of a dataframe; and if values are shared, select all those

Hi

Here's my dataframe:

data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
Sampleid = c("AVM_360", "AVM_360", "AVM_360"),
Currentid = c("Bibasis vasutana",
"Bibasis vasutana","Bibasis vasutana"),
%Match = c(100, 100, 99.5),
Matchid = c("Bibasis vasutana", "Burara vasutana", "Bibasis nikos")
)

I want to select the highest values according to "%Match". As you can see, there are two values both with 100.0 match but the "Matchid" is different. How should I write such a code that filters out the highest value of each group (Sampleid), and if there are multiple highest value with the same number, filter all those?

To get rid of the highest values of a column, I'd recommend using {dplyr} (make sure it's installed!)

dat = data.frame(
  stringsAsFactors = FALSE,
  check.names = FALSE,
  Sampleid = c("AVM_360", "AVM_360", "AVM_360"),
  Currentid = c("Bibasis vasutana",
                "Bibasis vasutana","Bibasis vasutana"),
  `%Match` = c(100, 100, 99.5),
  Matchid = c("Bibasis vasutana", "Burara vasutana", "Bibasis nikos")
)

dat
#>   Sampleid        Currentid %Match          Matchid
#> 1  AVM_360 Bibasis vasutana  100.0 Bibasis vasutana
#> 2  AVM_360 Bibasis vasutana  100.0  Burara vasutana
#> 3  AVM_360 Bibasis vasutana   99.5    Bibasis nikos

dplyr::filter(dat, `%Match` != max(`%Match`))
#>   Sampleid        Currentid %Match       Matchid
#> 1  AVM_360 Bibasis vasutana   99.5 Bibasis nikos

Created on 2022-02-24 by the reprex package (v2.0.1)

Thank you very much!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.