Dummy function with conditions and multiple columns

Lernst · July 10, 2018, 9:32pm

Hello everyone,

I have a data frame with many columns in: column1,column2, column3 ...What I want to do is to transfer them to many dummy variables like the following results. I tried some methods, but there are problems.

parallel_new <- parallel_new %>%
  mutate_at(. , vars(starts_with("column")), funs(dummy(., sep = "_")))

the original example table:

column1	column2
1	0
3	2
0	4

The results that I want will be the following. Note that if column =0, I don't want to create a new dummy variable but instead, set it =0. Besides, there are too many columns, I want the code that can do it efficiently.

column1	column2	column1_1	column1_3	column2_2	column2_4
1	0	1	0	0	0
3	2	0	1	1	0
0	4	0	0	0	1

jonspring · July 10, 2018, 10:29pm

I'm assuming the dummy columns are supposed to measure if there is (1) or isn't (0) that number in that column for that row, and not a count.

There's probably a more elegant way to do this, but it works:

library(tidyverse)

orig <- tribble(
  ~column1, ~column2,
  1, 0,
  3, 2,
  0, 4,
  1, 2   # Added to show case with repeated #s
)

# This starts with the input table...
orig %>%
  # adds new columns based on the same table
  cbind(orig) %>%
  rowid_to_column() %>%
  
  # make into long format and combine col name and value
  gather(col, value, column1:column2) %>%
  filter(value != 0) %>%
  mutate(col2 = paste(col, value, sep = "_")) %>%
  
  # I presume desired output should show 1
  # in rows where that value is found, 0 if not
  mutate(value = 1) %>%
  select(-col) %>%
  spread(col2, value, fill = 0) %>%
  select(-rowid) # Needed this to be last to keep row order

#>   column1.1 column2.1 column1_1 column1_3 column2_2 column2_4
#> 1         1         0         1         0         0         0
#> 2         3         2         0         1         1         0
#> 3         0         4         0         0         0         1
#> 4         1         2         1         0         1         0

Created on 2018-07-10 by the reprex package (v0.2.0).

JohnMount · July 10, 2018, 10:50pm

Or a base-R solution:

orig <- tibble::tribble(
  ~column1, ~column2,
  1, 0,
  3, 2,
  0, 4,
  1, 2   # Added to show case with repeated #s
)

newcols <- as.data.frame(model.matrix(~ column1 + column2, 
                                      lapply(orig, as.character)))
attr(newcols, "assign") <- NULL
newcols[["(Intercept)"]] <- NULL
cbind(orig, newcols)

#   column1 column2 column11 column13 column22 column24
# 1       1       0        1        0        0        0
# 2       3       2        0        1        1        0
# 3       0       4        0        0        0        1
# 4       1       2        1        0        1        0

Lernst · July 11, 2018, 4:19am

Thanks, it works in R.

Lernst · July 11, 2018, 4:19am

Thanks, It works to get the results.

jcblum · July 11, 2018, 10:07am

If your question's been answered (hooray!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

nwerth · July 11, 2018, 4:00pm

What you're doing is very similar to the model.matrix() function.

mydata <- data.frame(
  column1 = factor(c(1, 3, 0)),
  column2 = factor(c(0, 2, 4))
)
model.matrix(~ ., data = mydata)
#   (Intercept) column11 column13 column22 column24
# 1           1        1        0        0        0
# 2           1        0        1        1        0
# 3           1        0        0        0        1
# attr(,"assign")
# [1] 0 1 1 2 2
# attr(,"contrasts")
# attr(,"contrasts")$`column1`
# [1] "contr.treatment"
# 
# attr(,"contrasts")$column2
# [1] "contr.treatment"

While this function might not work for your situation, it doesn't hurt to learn about another of R's (too) many functions.

Lernst · August 5, 2018, 3:37am

Thanks for the suggestions. I am new in the community and I will try the best to learn the norm.