Quick way to create interactions with tidyverse?

torgo · April 28, 2023, 8:30pm

df <- expand.grid(x = rep(c("a", "b"), 2), y = c(1,2))

I would like to create four columns: a1, a2, b1, b2.
Each column would be a dummy variable.
For example, a1 would be 1 only if x == a and y == 1.

In practice I have many values in x and y so I'd like a way to do this programmatically through tidyverse.

FJCC · April 29, 2023, 2:13am

This isn't elegant but it gets to the result I think you want.

df <- expand.grid(x = rep(c("a", "b"), 2), y = c(1,2))

Xs <- unique(df$x)
Ys <- unique(df$y)
Combin <- expand.grid(Xs, Ys)
Xseries <- Combin$Var1
Yseries <- Combin$Var2
library(purrr)
Dummies <- map2_dfc(Xseries, Yseries, function(x, y) as.numeric(df$x == x & df$y == y))
#> New names:
#> • `` -> `...1`
#> • `` -> `...2`
#> • `` -> `...3`
#> • `` -> `...4`
colnames(Dummies) <- paste0(Xseries, Yseries)
Dummies
#> # A tibble: 8 × 4
#>      a1    b1    a2    b2
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     1     0     0     0
#> 2     0     1     0     0
#> 3     1     0     0     0
#> 4     0     1     0     0
#> 5     0     0     1     0
#> 6     0     0     0     1
#> 7     0     0     1     0
#> 8     0     0     0     1
cbind(df, Dummies)
#>   x y a1 b1 a2 b2
#> 1 a 1  1  0  0  0
#> 2 b 1  0  1  0  0
#> 3 a 1  1  0  0  0
#> 4 b 1  0  1  0  0
#> 5 a 2  0  0  1  0
#> 6 b 2  0  0  0  1
#> 7 a 2  0  0  1  0
#> 8 b 2  0  0  0  1

^{Created on 2023-04-28 with reprex v2.0.2}

torgo · April 29, 2023, 5:08pm

Thanks, but yes I was looking for something shorter. I'm surprised there isn't some more direct way to do this? Seems like a pretty common task.

FJCC · April 29, 2023, 5:37pm

There is a function in the caret package that seems to do what you want. I assume there are others, also.

library(caret)
#> Loading required package: ggplot2
#> Loading required package: lattice
df <- expand.grid(x = rep(c("a", "b"), 2), y = c(1,2))
df$y <- factor(df$y)
InterXY <- dummyVars(~x:y,data=df)
InterDF <- predict(InterXY, df)
cbind(df, InterDF)
#>   x y xa:y1 xb:y1 xa:y2 xb:y2
#> 1 a 1     1     0     0     0
#> 2 b 1     0     1     0     0
#> 3 a 1     1     0     0     0
#> 4 b 1     0     1     0     0
#> 5 a 2     0     0     1     0
#> 6 b 2     0     0     0     1
#> 7 a 2     0     0     1     0
#> 8 b 2     0     0     0     1

^{Created on 2023-04-29 with reprex v2.0.2}

MarekGierlinski · May 1, 2023, 12:34pm

You can do this using pivot_wider:

df |>
  mutate(val = 1, id = row_number())  |>
  pivot_wider(id_cols = id, names_from = c(x, y), values_from = val, names_sep = "", values_fill = 0) |>
  select(-id)

The result is as required:

# A tibble: 8 × 4
     a1    b1    a2    b2
  <dbl> <dbl> <dbl> <dbl>
1     1     0     0     0
2     0     1     0     0
3     1     0     0     0
4     0     1     0     0
5     0     0     1     0
6     0     0     0     1
7     0     0     1     0
8     0     0     0     1

Temporary variable id is necessary to prevent pivot_wider from aggregating rows.

system · May 8, 2023, 12:35pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.