Hi,
I am trying to follow along with the book Supervised Machine Learning for Text Analysis in R and am tryng to build my first classifier. I basically am trying to predict if a customer complaint is about credit or something else.
I have taken the data from Consumer Financial Protection Bureau (CFPB) and have reduced it to a subsample for it to run quickly-ish.
When running it through the resamples, I get the following error
Slice013: preprocessor 1/1, model 1/1: Error: Can't convert <textrecipes_tokenlist> to .
Below is my code
library(readr)
library(tidyverse)
library(tidymodels)
library(textrecipes)
library(janitor)
# To speed things up we take 1% of the rows per date
# producing a reduced complaints dataset
complaints <- read_csv("~/data/complaints.csv.zip") %>%
clean_names() %>%
filter(!is.na(consumer_complaint_narrative)) %>%
group_by(date_received) %>%
sample_frac(size = 0.01) %>%
ungroup()
# Create the classification label and select only two columns to keep it simple
complaints <- complaints %>%
mutate(tgt_class =
case_when(
str_detect(product, 'Credit|personal consumer') ~ "Credit",
TRUE ~ "Other")
) %>%
select(date_received, consumer_complaint_narrative, consumer_disputed, tgt_class) %>%
na.omit()
# Just double check that it looks sensible
table(complaints$tgt_class)
#> Credit Other
#> 3442 3301
head(complaints)
#> # A tibble: 6 x 4
#> date_received consumer_complaint_narrative consumer_disput~ tgt_class
#> <date> <chr> <chr> <chr>
#> 1 2015-03-19 "I wrote to XXXX, asking them to sto~ No Other
#> 2 2015-03-19 "In XX/XX/XXXX my wages that I earne~ Yes Other
#> 3 2015-03-20 "I sent a letter and have yet to rec~ No Other
#> 4 2015-03-20 "I have inquiry alerts through my ba~ No Other
#> 5 2015-03-21 "Equifax has changed my student loan~ No Credit
#> 6 2015-03-22 "I HAVE A FRAUD ALERT ON ALL MY CRED~ Yes Credit
# MODELLING ---------------------------------------------------------------
set.seed(1)
comp_split <- initial_split(complaints, strata = tgt_class)
comp_train <- training(comp_split)
comp_test <- testing(comp_split)
# Set up the resamples based on time slices based on week
set.seed(2)
complaints_slices <- sliding_period(
comp_train,
date_received,
"month",
lookback = Inf,
assess_stop = 1,
skip = 3,
step = 1
)
# Now create a very simple text recipe
comp_rec <- recipe(tgt_class ~., data = comp_train) %>%
step_tokenize(consumer_complaint_narrative) %>% # Tokenizes to words by default
step_tokenfilter(consumer_complaint_narrative, max_tokens = 500)
# Double check it tokenized
comp_rec %>%
prep() %>%
bake(new_data = NULL)
#> # A tibble: 5,058 x 4
#> date_received consumer_complaint_narrative consumer_disputed tgt_class
#> <date> <tknlist> <fct> <fct>
#> 1 2015-03-19 [21 tokens] No Other
#> 2 2015-03-19 [549 tokens] Yes Other
#> 3 2015-03-20 [102 tokens] No Other
#> 4 2015-03-20 [128 tokens] No Other
#> 5 2015-03-21 [93 tokens] No Credit
#> 6 2015-03-22 [251 tokens] Yes Credit
#> 7 2015-03-23 [232 tokens] No Credit
#> 8 2015-03-23 [25 tokens] No Other
#> 9 2015-03-24 [63 tokens] No Credit
#> 10 2015-03-24 [73 tokens] No Other
#> # ... with 5,048 more rows
# Run the Random Forest
rf_spec <-
rand_forest() %>%
set_engine("ranger", importance = "impurity") %>%
set_mode("classification")
rf_wflow <- # new workflow object
workflow() %>% # use workflow function
add_recipe(comp_rec) %>% # use the new recipe
add_model(rf_spec) # add your model spec
rf_res <-
rf_wflow %>%
fit_resamples(
resamples = complaints_slices,
metrics = metric_set(kap, roc_auc, sens, spec),
control = control_resamples(save_pred = TRUE)
)
#> x Slice01: preprocessor 1/1, model 1/1: Error: Can't convert <textrecipes_tokenlist...
#> x Slice02: preprocessor 1/1, model 1/1: Error: Can't convert <textrecipes_tokenlist...
#> x Slice03: preprocessor 1/1, model 1/1: Error: Can't convert <textrecipes_tokenlist...
#> x Slice04: preprocessor 1/1, model 1/1: Error: Can't convert <textrecipes_tokenlist...
#> x Slice05: preprocessor 1/1, model 1/1: Error: Can't convert <textrecipes_tokenlist...
#> x Slice06: preprocessor 1/1, model 1/1: Error: Can't convert <textrecipes_tokenlist...
#> ..........
#> Warning: All models failed. See the `.notes` column.
Created on 2021-03-31 by the reprex package (v1.0.0)
Any help would be greatly appreciated