I have code which looks like the following and I am trying to get the highest rank for each publisher each month:
Rank Title Publisher Year Month Date
1 a1 a 2000 April Apr 2000
2 b1 b 2000 April Apr 2000
3 a2 a 2000 April Apr 2000
1 a3 a 2000 May May 2000
So I would want a new dataset that had every row except row three since that is publisher a's second highest rated book for that month.
I don't know how to go about doing this so any suggestions would be appreciated. Thank you all!
If the Rank is always 1, then:
library(dplyr)
df %>%
group_by(Publisher) %>%
filter(Rank == 1) %>%
ungroup()
If the rank varies, then:
df %>%
group_by(Publisher) %>%
arrange(Rank) %>%
filter(row_number() == 1L) %>%
ungroup()
2 Likes
I think you will also want to group by month while you're at it, if you want to get each publisher's #1 separately for each month they're in the data set.
df %>%
group_by(Publisher, Date) %>%
filter(Rank == 1L) %>%
ungroup()
2 Likes
Correct, I overlooked that.
This is the code that I tried but it was just returning the #1 seller for each month, not the best ranking for each publisher for each month.
plt1.dat <- Overall.Sales %>%
group_by(Publisher2, Date) %>%
filter(Rank.in.Units == 1L) %>%
ungroup()
EDIT:
I got it to work!! The 1L was throwing it off! This code works:
plt1.dat <- Overall.Sales %>%
group_by(Publisher2, Date) %>%
filter(Rank.in.Units == min(Rank.in.Units)) %>%
ungroup()
1 Like