Hi! I'm trying to fit an xgboost model (regression) for some Airbnb data. I´m using the tidymodels framework. I go thru my usual steps when working with tidymodels:
- Split data
data_split <- initial_split(listings_regre,
strata = "y",
prop = 0.8)
data_train <- training(data_split)
data_test <- testing(data_split)
- Create recipe
rec <- recipe(y ~ ., data = data_train) %>%
step_nzv(all_nominal()) %>%
step_dummy(all_nominal())
- Create model
xgb_mod <-
boost_tree() %>%
set_engine('xgboost') %>%
set_mode('regression')
- Create workflow
xgb_flow <- workflow() %>%
add_model(xgb_mod) %>%
add_recipe(rec)
- Fit model
xgb_fit <- xgb_flow %>%
last_fit(split = data_split)
Then I get:
preprocessor 1/1, model 1/1: Error in xgboost::xgb.DMatrix(x, label = y, missing = NA): 'data' has class 'character' and length 682192.\n 'data' accepts either a numeric matrix or a single filename."
But if change the workflow to
xgb_flow <- workflow() %>%
add_model(xgb_mod) %>%
add_formula(y ~ .)
Everything works just fine.
I understood from here that both of these should work but is not happening. Does anybody know what is wrong with my recipe? I prefer working with recipes so I'd prefer using the first option.
Thank you in advance