# how to keep the top n levels of a factor?

Hello there,

I am struggling with something that is perhaps very simple.
Consider this factor :

``````> factor(c('a','b','c','d','a','b'))
[1] a b c d a b
Levels: a b c d
``````

This factor is already sorted by order of importance.

That is `a` is better than `b`, and so on. I would like to keep the first 2 top levels, and put the rest in some `other` category. Very much like `fct_lump` but here the lumping has nothing to do with the frequency (they all appear once).

Can I do that with `forcats` ?
Thanks!

There's a chance that you are mistaken. The levels are displayed in alphabetical order, but they are not ordered here. Note the difference below:

``````> a <- factor(x = c('a','b','c','d','a','b'))
> a
[1] a b c d a b
Levels: a b c d
> is.ordered(x = a)
[1] FALSE
> b <- factor(x = c('a','b','c','d','a','b'), ordered = TRUE)
> b
[1] a b c d a b
Levels: a < b < c < d
> is.ordered(x = b)
[1] TRUE
``````

I think you're looking for something like this:

``````set.seed(seed = 33122)
factor_data <- factor(x = sample(x = letters[1:5],
size = 20,
replace = TRUE),
ordered = TRUE)
factor_data
#>  [1] d c e e c e b a a b b d d e c e b e c a
#> Levels: a < b < c < d < e

forcats::fct_other(f = factor_data,
keep = tail(x = levels(x = factor_data),
n = 2))
#>  [1] d     Other e     e     Other e     Other Other Other Other Other
#> [12] d     d     e     Other e     Other e     Other Other
#> Levels: d < e < Other
``````

Created on 2019-06-15 by the reprex package (v0.3.0)

Hope this helps.

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.