torgo
1
df <- expand.grid(x = rep(c("a", "b"), 2), y = c(1,2))
I would like to create four columns: a1
, a2
, b1
, b2
.
Each column would be a dummy variable.
For example, a1
would be 1 only if x == a
and y == 1
.
In practice I have many values in x
and y
so I'd like a way to do this programmatically through tidyverse
.
FJCC
2
This isn't elegant but it gets to the result I think you want.
df <- expand.grid(x = rep(c("a", "b"), 2), y = c(1,2))
Xs <- unique(df$x)
Ys <- unique(df$y)
Combin <- expand.grid(Xs, Ys)
Xseries <- Combin$Var1
Yseries <- Combin$Var2
library(purrr)
Dummies <- map2_dfc(Xseries, Yseries, function(x, y) as.numeric(df$x == x & df$y == y))
#> New names:
#> • `` -> `...1`
#> • `` -> `...2`
#> • `` -> `...3`
#> • `` -> `...4`
colnames(Dummies) <- paste0(Xseries, Yseries)
Dummies
#> # A tibble: 8 × 4
#> a1 b1 a2 b2
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 0 0
#> 2 0 1 0 0
#> 3 1 0 0 0
#> 4 0 1 0 0
#> 5 0 0 1 0
#> 6 0 0 0 1
#> 7 0 0 1 0
#> 8 0 0 0 1
cbind(df, Dummies)
#> x y a1 b1 a2 b2
#> 1 a 1 1 0 0 0
#> 2 b 1 0 1 0 0
#> 3 a 1 1 0 0 0
#> 4 b 1 0 1 0 0
#> 5 a 2 0 0 1 0
#> 6 b 2 0 0 0 1
#> 7 a 2 0 0 1 0
#> 8 b 2 0 0 0 1
Created on 2023-04-28 with reprex v2.0.2
torgo
3
Thanks, but yes I was looking for something shorter. I'm surprised there isn't some more direct way to do this? Seems like a pretty common task.
FJCC
4
There is a function in the caret package that seems to do what you want. I assume there are others, also.
library(caret)
#> Loading required package: ggplot2
#> Loading required package: lattice
df <- expand.grid(x = rep(c("a", "b"), 2), y = c(1,2))
df$y <- factor(df$y)
InterXY <- dummyVars(~x:y,data=df)
InterDF <- predict(InterXY, df)
cbind(df, InterDF)
#> x y xa:y1 xb:y1 xa:y2 xb:y2
#> 1 a 1 1 0 0 0
#> 2 b 1 0 1 0 0
#> 3 a 1 1 0 0 0
#> 4 b 1 0 1 0 0
#> 5 a 2 0 0 1 0
#> 6 b 2 0 0 0 1
#> 7 a 2 0 0 1 0
#> 8 b 2 0 0 0 1
Created on 2023-04-29 with reprex v2.0.2
1 Like
You can do this using pivot_wider
:
df |>
mutate(val = 1, id = row_number()) |>
pivot_wider(id_cols = id, names_from = c(x, y), values_from = val, names_sep = "", values_fill = 0) |>
select(-id)
The result is as required:
# A tibble: 8 × 4
a1 b1 a2 b2
<dbl> <dbl> <dbl> <dbl>
1 1 0 0 0
2 0 1 0 0
3 1 0 0 0
4 0 1 0 0
5 0 0 1 0
6 0 0 0 1
7 0 0 1 0
8 0 0 0 1
Temporary variable id
is necessary to prevent pivot_wider
from aggregating rows.
1 Like
system
Closed
6
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.