Use `fct_unify()` in dplyr pipeline follow-up

harmv · December 22, 2024, 10:32am

Continuing the discussion from Use `fct_unify()` in dplyr pipeline:

The original post ends in a folder of unified factor variables. The last part, to use the unified factors in a dplyr pipeline was missing, which for me leaves an unsatisfactory feeling. The repex below completes the entire pipeline.

# original example data
example_data <-
  tibble(
    letter1 = LETTERS[1:5],
    letter2 = LETTERS[2:6]) |>
  transmute(
    letter1 = factor(letter1),
    letter2 = factor(letter2))

# unify both factors using mutate. `letter1` and `letter1_uf` are the same data, but differ only in factor levels. Same with `letter2`/`letter2_uf`.
df <-
  example_data |>
  mutate(
    letter1_uf = (list(letter1, letter2) |> forcats::fct_unify())[[1]],
    letter2_uf = (list(letter1, letter2) |> forcats::fct_unify())[[2]]
  )
str(df)

Some overhead is present because the unification is calculated twice, questioning if this should be part of the pipeline.

nirgrahamuk · December 24, 2024, 5:36pm

you can do arbitrary code in a pipeline step.
Here I use lvls_union to calculate the common levels once, and apply them the required number of times; I also generalise, so it can unify any number of letters style column; 3 in this example

example_data <-
  tibble(
    letter1 = factor(LETTERS[1:5]),
    letter2 = factor(LETTERS[2:6]),
    letter3 = factor(LETTERS[3:7]))
  

result <- example_data |> (\(x) {
  cols_to_unify <- paste0("letter",1:3)
  common_levels <- lvls_union(map(cols_to_unify,\(f){pluck(x,f)}))
  
  y <- x |>
    mutate(across(cols_to_unify,
                  .fns = \(v){factor(v,common_levels)},
                  .names = "{.col}_uf")
    )
  y
})()

result
str(result)

system · March 24, 2025, 5:36pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.