Hello,
I am trying to create a new df that contains the top 15 varieties of wines that show up the most in the data set that I am working with.
I am able to get the count of each variety, and arrange in descending order, but am having issues selecting just those top 15. Can someone help?
When I run the code below I get this error: Error: Column indexes must be at most 2 if positive, not 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 Call rlang::last_error()
to see a backtrace
Code:
WineReviewDataVar <- WineReviewData %>%
group_by(Variety) %>%
count()
WineReviewData_Top15Var <- WineReviewDataVar %>%
arrange(desc(n)) %>%
WineReviewData_Top15Var[1:15, ]
I would be more confident in my answer with a reproducible example, called a reprex , but try replacing the last line with top_n(15)
1 Like
FJCC
June 5, 2019, 1:54am
3
Here is an example with invented data. I used head instead of top_n().
library(dplyr)
set.seed(67489)
df <- data.frame(Variety = sample(LETTERS, 200, replace = TRUE),
Score = sample(1:10, 200, replace = TRUE))
head(df)
#> Variety Score
#> 1 X 3
#> 2 K 10
#> 3 M 8
#> 4 U 3
#> 5 Z 2
#> 6 W 10
dfVar <- df %>% count(Variety)
head(dfVar)
#> # A tibble: 6 x 2
#> Variety n
#> <fct> <int>
#> 1 A 8
#> 2 B 6
#> 3 C 5
#> 4 D 8
#> 5 E 9
#> 6 F 5
dfTop15 <- dfVar %>% arrange(desc(n)) %>% head(15)
dfTop15
#> # A tibble: 15 x 2
#> Variety n
#> <fct> <int>
#> 1 Q 14
#> 2 S 12
#> 3 O 11
#> 4 N 10
#> 5 T 10
#> 6 X 10
#> 7 E 9
#> 8 P 9
#> 9 W 9
#> 10 A 8
#> 11 D 8
#> 12 G 8
#> 13 Z 8
#> 14 L 7
#> 15 Y 7
Created on 2019-06-04 by the reprex package (v0.2.1)
Anisha
June 5, 2019, 11:26pm
4
Thank You!
I've never see the top_n() function before.
Anisha
June 5, 2019, 11:27pm
5
Thank you! I didn't think of using the head() function in this way, and it does work!
system
Closed
June 26, 2019, 11:27pm
6
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.