I have a data frame with many columns in: column1,column2, column3 ...What I want to do is to transfer them to many dummy variables like the following results. I tried some methods, but there are problems.
The results that I want will be the following. Note that if column =0, I don't want to create a new dummy variable but instead, set it =0. Besides, there are too many columns, I want the code that can do it efficiently.
I'm assuming the dummy columns are supposed to measure if there is (1) or isn't (0) that number in that column for that row, and not a count.
There's probably a more elegant way to do this, but it works:
library(tidyverse)
orig <- tribble(
~column1, ~column2,
1, 0,
3, 2,
0, 4,
1, 2 # Added to show case with repeated #s
)
# This starts with the input table...
orig %>%
# adds new columns based on the same table
cbind(orig) %>%
rowid_to_column() %>%
# make into long format and combine col name and value
gather(col, value, column1:column2) %>%
filter(value != 0) %>%
mutate(col2 = paste(col, value, sep = "_")) %>%
# I presume desired output should show 1
# in rows where that value is found, 0 if not
mutate(value = 1) %>%
select(-col) %>%
spread(col2, value, fill = 0) %>%
select(-rowid) # Needed this to be last to keep row order
#> column1.1 column2.1 column1_1 column1_3 column2_2 column2_4
#> 1 1 0 1 0 0 0
#> 2 3 2 0 1 1 0
#> 3 0 4 0 0 0 1
#> 4 1 2 1 0 1 0
If your question's been answered (hooray!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it: