I had to implement one hot encoding for a factor column today.
I'm not sending this to lm
(which would directly accept the factor column) but rather creating a "truth table" for QCA. Anyway, it was much harder than I expected, so I wonder if people could check my approach and recommend a better approach? I looked at the tidymodels
and recipes
approach discussed on r-bloggers but that seemed too heavyweight for a relatively simple need.
I was inspired by this stackexchange thread (see last answer)
library(tidyverse)
tribble(~student_id, ~subject,
1, "Maths",
2, "Science",
3, "English",
4, NA_character_) %>%
pivot_wider(names_from = subject,
values_from = subject,
values_fill = list(subject = F),
values_fn = list(subject = is.character)) %>%
select(-`NA`)
#> # A tibble: 4 x 4
#> student_id Maths Science English
#> <dbl> <lgl> <lgl> <lgl>
#> 1 1 TRUE FALSE FALSE
#> 2 2 FALSE TRUE FALSE
#> 3 3 FALSE FALSE TRUE
#> 4 4 FALSE FALSE FALSE
Created on 2020-06-09 by the reprex package (v0.3.0)
It actually does work with repeated ids as well (below I change the second row of data from 2
to 1
and the output correctly has only three rows.
library(tidyverse)
tribble(~student_id, ~subject,
1, "Maths",
1, "Science",
3, "English",
4, NA_character_) %>%
pivot_wider(names_from = subject,
values_from = subject,
values_fill = list(subject = F),
values_fn = list(subject = is.character)) %>%
select(-`NA`)
#> # A tibble: 3 x 4
#> student_id Maths Science English
#> <dbl> <lgl> <lgl> <lgl>
#> 1 1 TRUE TRUE FALSE
#> 2 3 FALSE FALSE TRUE
#> 3 4 FALSE FALSE FALSE
Created on 2020-06-09 by the reprex package (v0.3.0)