Filtering Data: By top x of data frames variable

pnwballr · February 23, 2021, 3:20am

I am trying to filter a dataframe by a variable and struggling. Trying to filter the d.f only selecting the highest x number of a value in the data frame. I have been trying to use filter and top_n, but want to be able to set the top_n on a specific variable. So far all I can get is on the entire data.frame and there is no rationale for what variable the top_n is selecting on.

Any suggestions on how to make this work?

andresrcs · February 23, 2021, 3:33am

Use the wt parameter for top_n()

Function's documentation says it defaults to the last variable in the data frame.

If you need more specific help, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

pnwballr · February 23, 2021, 3:41am

I attached a default data set just to make it more clear what I am trying to do. I want to filter the data by returning the two highest values in the Score variable for all of the UCLA datapoints. Then again for the two highest values in the Score variable for all of the FLORIDA datapoints. I am not including any code because I am only filtering the score out right now by highest which I don't think is very helpful.

Home. Score.
UCLA. 4
UCLA. 7
UCLA. 9
UCLA. 10
FLORIDA. 3
FLORIDA. 5
FLORIDA. 6
FLORIDA. 8

lars · February 23, 2021, 8:25am

Hi tmulflur,

A reprex could be as simple as just providing the example dataset and, optionally, the code with which you're trying to achieve your goal. Doing so, will make it easier for us to assist you quicker, instead of recreating the dataset manually.

Anyway, is the following approach, specifically the slice_head(), providing you the results you're looking for?

library(tidyverse)

df <- tribble(
  ~home, ~score,
  "UCLA", 4,
  "UCLA", 7,
  "UCLA", 9,
  "UCLA", 10,
  "FLORIDA", 3,
  "FLORIDA", 5,
  "FLORIDA", 6,
  "FLORIDA", 8,
  )
df %>% glimpse()
#> Rows: 8
#> Columns: 2
#> $ home  <chr> "UCLA", "UCLA", "UCLA", "UCLA", "FLORIDA", "FLORIDA", "FLORID...
#> $ score <dbl> 4, 7, 9, 10, 3, 5, 6, 8

df %>% 
  group_by(home) %>% 
  arrange(desc(score)) %>% 
  slice_head(n = 2) %>% 
  ungroup()
#> # A tibble: 4 x 2
#>   home    score
#>   <chr>   <dbl>
#> 1 FLORIDA     8
#> 2 FLORIDA     6
#> 3 UCLA       10
#> 4 UCLA        9

^{Created on 2021-02-23 by the reprex package (v1.0.0)}

Matthias · February 23, 2021, 8:38am

You can group it, first for the home, then you can find the top 1 (or 2 or 3) hits within the group and can define the variable whose intensity is compared (here score):

df %>% 
  group_by(home) %>%
   top_n(1, score) 

# A tibble: 2 x 2
# Groups:   home [2]
 home    score
 <chr>   <dbl>
1 UCLA       10
2 FLORIDA     8

lars · February 23, 2021, 8:46am

Indeed, there are different ways to approach this and this one is even more concise.
Although, according to the help documentation the top_n() has been superseded, I would then recommend to use slice_max().

pnwballr · February 23, 2021, 6:45pm

Thank you guys! This community is awesome, that helped big time.

Here is the code I am running now:

fhdecposs=filter(decposs, Half_Status <= 1) %>%
group_by(Home) %>%
arrange(fhdecposs, (Poss_Num)) %>%
slice_tail(n=6)

This worked once, but when I try to rerun the code over I get an error:
Error: arrange() failed at implicit mutate() step.

Problem with mutate() input ..1.
x Input ..1 can't be recycled to size 774.
Input ..1 is fhdecposs.
Input ..1 must be size 774 or 1, not 36.

I am trying to troubleshoot myself, but will take any suggestions! Being a beginner at this would be so frustrating without the R community.

nirgrahamuk · February 23, 2021, 7:22pm

What purpose is fhdecposs meant to serve here ?

pnwballr · February 23, 2021, 7:39pm

I am trying to arrange it by a variable (Poss_Num) first so when I slice the tail I am getting the 6 highest values for each group. fhdecposs is the data

nirgrahamuk · February 23, 2021, 7:40pm

If the data is piped in (%>%) then also passing a dataframe name is an error. Try removing it and starting over.

pnwballr · February 24, 2021, 9:26pm

That worked!! Thank you so much. So simple. I am new to this and just get lost quickly when trying to put it all together.

system · March 17, 2021, 9:26pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.