Suppose that I have a dataset containing the information of 10 students and their grades. So the data look something like this:
Student
Grade
1
76
2
99
3
97
4
65
5
86
6
84
7
52
8
55
9
69
10
72
I wanna add another column (using mutate maybe?) called "Rank." And this rank will be based on students' grades. That is,
The first two students with the highest grades will be ranked 1
The three students with the next highest grades will be ranked 2
The three students with the next highest grades will be ranked 3
The rest will be ranked 4.
So ultimately, I wanna create something like this:
Student
Grade
Rank
1
76
2
2
99
1
3
97
1
4
65
3
5
86
2
6
84
2
7
52
4
8
55
4
9
69
3
10
72
3
Note that the example that I gave would be easier doing it by hand since there are only 10 students. But the actual dataset that I'll be using will have over 100 observations. And the ranking will be something like: The first 11 students with the highest grades will be ranked 1, the 33 students with the next highest grades will be ranked 2nd, etc.
So one option would be using dplyr::case_when. But given your last point, I think we can find a better solution. I think the best solution would be to write a function that you can use within dplyr::mutate. Are you setting your curve according to some mathematical function? How do you determine your rank?
I don't think dense_rank would quite do it because the grades may not necessarily be ties, but need the same rank - e.g. row 2 and row 3 from OP's desired output.
Yes, it seems i didnt read the users requirements closely enough i find it odd how he stated them. Reverse Dense rank would probably be used to find who the top 11 are as a first step,and the next 30 or however many and then reclassification can happen after that if needed.