efficient code to merge levels of a factor variable to form a factor variable with fewer levels.

I have a factor variable called disease severity with 6 levels: asymptotic, healthy control,mild,moderate, severe, critical severe.
I want to form a new factor variable with four levels: asymptotic,combine(mild & moderate), severe, critical severe & ignore healthy control.

I can do this with an if else if statement but is there an easier way(efficient code) to do this?

You can just set the factor levels you aren't interested in to NA

See the example below as a guide.

set.seed(123)
x <- factor(sample(letters[1:6], 20, TRUE))
x
#>  [1] c f c b b f c e d f f a b c e c c a d a
#> Levels: a b c d e f
levels(x)[c(3, 5)] <- NA
x
#>  [1] <NA> f    <NA> b    b    f    <NA> <NA> d    f    f    a    b    <NA> <NA>
#> [16] <NA> <NA> a    d    a   
#> Levels: a b d f

Created on 2020-09-01 by the reprex package (v0.3.0)

Thanks. Is there a way to merge two levels of a factor to a single level? Say merge moderate & mild to one new level called medium.

set.seed(123)
x <- factor(sample(letters[1:6], 20, TRUE))
x
#>  [1] c f c b b f c e d f f a b c e c c a d a
#> Levels: a b c d e f
levels(x)[c(3, 5)] <- levels(x)[c(2, 4)]
x
#>  [1] b f b b b f b d d f f a b b d b b a d a
#> Levels: a b d f

Created on 2020-09-01 by the reprex package (v0.3.0)

The tidyverse package forcats has a number of useful functions for working with factors. For combining two levels, we could use fct_collapse().

library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 3.6.3
library(forcats)
#> Warning: package 'forcats' was built under R version 3.6.3

df <- tibble(disease_severity = factor(c("asymptotic", "healthy control", "healthy control", 
                                         "mild", "moderate", "severe", "critical severe", 
                                         "severe", "mild", "asymptotic", "moderate")))

glimpse(df$disease_severity)
#>  Factor w/ 6 levels "asymptotic","critical severe",..: 1 3 3 4 5 6 2 6 4 1 ...

df$disease_severity <- fct_collapse(df$disease_severity, medium = c("mild", "moderate"))
  
glimpse(df$disease_severity)
#>  Factor w/ 5 levels "asymptotic","critical severe",..: 1 3 3 4 4 5 2 5 4 1 ...

Created on 2020-09-02 by the reprex package (v0.3.0)

1 Like

library(forcats) allows

set.seed(123)
x <- factor(sample(letters[1:6], 20, TRUE))
x

forcats::fct_collapse(x,
                      c_d = c("c","d"))
# [1] c_d f   c_d b   b   f   c_d e   c_d f   f   a   b   c_d e   c_d c_d a   c_d a  
# Levels: a b c_d e f

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.