applying one_hot manually in R

I am trying to apply the function one_hot manually in R for an assignment.

sample of my dataset

a <- c('red','red','green')
b <- c('large', 'medium', 'small')
c <- c('wide','narrow','narrow')

df <- data.frame(a, b, c)

using the one_hot function from the package scorecard returns this output

one_hot(df)

output

  a_green a_red b_large b_medium b_small c_narrow c_wide
1:       0     1       1        0       0        0      1
2:       0     1       0        1       0        1      0
3:       1     0       0        0       1        1      0

I would like to create the same output without using the function. So far I did those steps:

  • converted the categorical columns to factors
for (i in colnames(df)) {
        df[i] <- do.call(cbind.data.frame, lapply(df[i], as.factor))}
  • found the length of the levels (k). I wrote this function
to.encode<-c('a','b','c')

one.hot <- function(df, to.encode) {
  len=c()
  k=sapply(df[to.encode], levels)
  for (i in k) {
    if (!is.null(i)){
      len<-length(i)-1
      print(len)
    }
  }
}

output is the length of the levels minus 1 (k-1)

> one.hot(df)
[1] 1
[1] 2
[1] 1

Now I want to create (k-1) new columns for each categorical column. I want to set the value to 1 if the original variable's value corresponded to the column, and 0 otherwise.

Any advice on how to take this to the next step? Thank you

2 posts were merged into an existing topic: How to apply one_hot manually