I am trying to clean the column names of my data after I scale and dummy encode it:
I set up the task and learner like this:
# Task for classification.
data.task = makeClassifTask(id = data_id,
data = data,
target = target,
positive = target_values[POS_CLV_INDEX])
# Learner: Random Forest.
lrn = makeLearner("classif.randomForest",
predict.type = "prob",
fix.factors.prediction = TRUE)
Then I form a CPO pipeline with the learner:
# Normalisation/dummy encode.
data.lrn = cpoScale() %>>% cpoDummyEncode() %>>% lrn
Ideally, I want to have something like:
# Normalisation/dummy encode.
data.lrn = cpoScale() %>>% cpoDummyEncode() %>>% janitor::clean_names() %>>% lrn
that will clean the column names after the scaling and dummy encoding (as there will be new column names formed). However, I get an error saying that the default is missing for clean_names(). The documentation says that clean_names() should work in a pipeline, but I'm not sure how to use it in this context.
Essentially, the source issue was to do with values in the dataframe having spaces in them, so when the dummy encode happens, it creates column names that are unrecognisable to the model/predictor.