How to select the highest value of a dataframe; and if values are shared, select all those

niko_bio · February 24, 2022, 9:18am

Hi

Here's my dataframe:

data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
Sampleid = c("AVM_360", "AVM_360", "AVM_360"),
Currentid = c("Bibasis vasutana",
"Bibasis vasutana","Bibasis vasutana"),
%Match = c(100, 100, 99.5),
Matchid = c("Bibasis vasutana", "Burara vasutana", "Bibasis nikos")
)

I want to select the highest values according to "%Match". As you can see, there are two values both with 100.0 match but the "Matchid" is different. How should I write such a code that filters out the highest value of each group (Sampleid), and if there are multiple highest value with the same number, filter all those?

JackDavison · February 24, 2022, 9:41am

To get rid of the highest values of a column, I'd recommend using {dplyr} (make sure it's installed!)

dat = data.frame(
  stringsAsFactors = FALSE,
  check.names = FALSE,
  Sampleid = c("AVM_360", "AVM_360", "AVM_360"),
  Currentid = c("Bibasis vasutana",
                "Bibasis vasutana","Bibasis vasutana"),
  `%Match` = c(100, 100, 99.5),
  Matchid = c("Bibasis vasutana", "Burara vasutana", "Bibasis nikos")
)

dat
#>   Sampleid        Currentid %Match          Matchid
#> 1  AVM_360 Bibasis vasutana  100.0 Bibasis vasutana
#> 2  AVM_360 Bibasis vasutana  100.0  Burara vasutana
#> 3  AVM_360 Bibasis vasutana   99.5    Bibasis nikos

dplyr::filter(dat, `%Match` != max(`%Match`))
#>   Sampleid        Currentid %Match       Matchid
#> 1  AVM_360 Bibasis vasutana   99.5 Bibasis nikos

^{Created on 2022-02-24 by the reprex package (v2.0.1)}

niko_bio · February 25, 2022, 7:49am

Thank you very much!

system · March 4, 2022, 7:50am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.