I am struggling with something that is perhaps very simple.
Consider this factor :
> factor(c('a','b','c','d','a','b'))
[1] a b c d a b
Levels: a b c d
This factor is already sorted by order of importance.
That is a is better than b, and so on. I would like to keep the first 2 top levels, and put the rest in some other category. Very much like fct_lump but here the lumping has nothing to do with the frequency (they all appear once).
There's a chance that you are mistaken. The levels are displayed in alphabetical order, but they are not ordered here. Note the difference below:
> a <- factor(x = c('a','b','c','d','a','b'))
> a
[1] a b c d a b
Levels: a b c d
> is.ordered(x = a)
[1] FALSE
> b <- factor(x = c('a','b','c','d','a','b'), ordered = TRUE)
> b
[1] a b c d a b
Levels: a < b < c < d
> is.ordered(x = b)
[1] TRUE
I think you're looking for something like this:
set.seed(seed = 33122)
factor_data <- factor(x = sample(x = letters[1:5],
size = 20,
replace = TRUE),
ordered = TRUE)
factor_data
#> [1] d c e e c e b a a b b d d e c e b e c a
#> Levels: a < b < c < d < e
forcats::fct_other(f = factor_data,
keep = tail(x = levels(x = factor_data),
n = 2))
#> [1] d Other e e Other e Other Other Other Other Other
#> [12] d d e Other e Other e Other Other
#> Levels: d < e < Other