Mutate based on the Largest to Smallest Values

HTP · April 19, 2022, 3:02pm

Hello everyone!

Suppose that I have a dataset containing the information of 10 students and their grades. So the data look something like this:

Student	Grade
1	76
2	99
3	97
4	65
5	86
6	84
7	52
8	55
9	69
10	72

I wanna add another column (using mutate maybe?) called "Rank." And this rank will be based on students' grades. That is,

The first two students with the highest grades will be ranked 1
The three students with the next highest grades will be ranked 2
The three students with the next highest grades will be ranked 3
The rest will be ranked 4.

So ultimately, I wanna create something like this:

Student	Grade	Rank
1	76	2
2	99	1
3	97	1
4	65	3
5	86	2
6	84	2
7	52	4
8	55	4
9	69	3
10	72	3

Note that the example that I gave would be easier doing it by hand since there are only 10 students. But the actual dataset that I'll be using will have over 100 observations. And the ranking will be something like: The first 11 students with the highest grades will be ranked 1, the 33 students with the next highest grades will be ranked 2nd, etc.

Thank you!

dvetsch75 · April 19, 2022, 3:23pm

So one option would be using dplyr::case_when. But given your last point, I think we can find a better solution. I think the best solution would be to write a function that you can use within dplyr::mutate. Are you setting your curve according to some mathematical function? How do you determine your rank?

nirgrahamuk · April 19, 2022, 3:28pm

I believe you are describing the dplyr::dense_rank function

dvetsch75 · April 19, 2022, 3:53pm

I don't think dense_rank would quite do it because the grades may not necessarily be ties, but need the same rank - e.g. row 2 and row 3 from OP's desired output.

nirgrahamuk · April 19, 2022, 4:06pm

I suppose "the rest will be rank 4" is the difference, ide use denserank and then ifelse or perhaps pmax to cap at 4

dvetsch75 · April 19, 2022, 4:20pm

Unfortunately, dense_rank still doesn't solve OP's problem because there aren't numerical ties in grades:

``` r
library(dplyr)
library(tibble)

# from OPs original post
df <- tibble(
    student = 1:10,
    grade = c(76, 99, 97, 65, 86, 84, 52, 55, 69, 72)
)


df %>% 
    mutate(
        rank = dense_rank(grade)
    )
#> # A tibble: 10 x 3
#>    student grade  rank
#>      <int> <dbl> <int>
#>  1       1    76     6
#>  2       2    99    10
#>  3       3    97     9
#>  4       4    65     3
#>  5       5    86     8
#>  6       6    84     7
#>  7       7    52     1
#>  8       8    55     2
#>  9       9    69     4
#> 10      10    72     5

^{Created on 2022-04-19 by the reprex package (v1.0.0)}

nirgrahamuk · April 19, 2022, 4:40pm

Yes, it seems i didnt read the users requirements closely enough i find it odd how he stated them. Reverse Dense rank would probably be used to find who the top 11 are as a first step,and the next 30 or however many and then reclassification can happen after that if needed.

system · May 10, 2022, 4:40pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.