Hi,
I am using tidymodels
to explore the small_fine_foods
dataset where I'm just trying to replicate the analysis done in this blog post. It seems to be something small but the models all fail with:
Error: Columns (
review
) are not numeric; cannot convert to matrix
Below is a minimal example. Can anyone see where my error is?
library(recipes)
library(modeldata)
library(textrecipes)
data("small_fine_foods")
training_data
library(hardhat)
sparse_bp <- default_recipe_blueprint(composition = "dgCMatrix")
text_rec <-
recipe(score ~ review, data = training_data) %>%
step_tokenize(review)
lasso_spec <-
logistic_reg(penalty = 0.02, mixture = 1) %>%
set_engine("glmnet")
wf_sparse <-
workflow() %>%
add_recipe(text_rec, blueprint = sparse_bp) %>%
add_model(lasso_spec)
food_folds <- vfold_cv(training_data, strata = score)
sparse = fit_resamples(wf_sparse, food_folds)
# Error: Columns (`review`) are not numeric; cannot convert to matrix
# The tokens appear to be present
text_rec %>%
prep() %>%
bake(new_data = NULL)
# # A tibble: 4,000 x 2
# review score
# <tknlist> <fct>
# 1 [13 tokens] other
# 2 [94 tokens] great
# 3 [104 tokens] great
# 4 [36 tokens] great
# 5 [19 tokens] great
# 6 [27 tokens] great
# 7 [83 tokens] other
# 8 [53 tokens] great
# 9 [55 tokens] great
# 10 [45 tokens] great
# # ... with 3,990 more rows