I am working off Julia Silge's blog post demonstrating sparse matrix models but I am using the ranger
model for classification, rather than lasso
which she uses. The model works fine when using non-sparse data but predict
fails with sparse data complaining "cannot coerce class 'structure("dgCMatrix", package = "Matrix")' to a data.frame." How can this be? Thanks.
library(tidyverse)
library(tidymodels)
library(tidytext)
library(textrecipes)
library(stopwords)
library(hardhat)
data("small_fine_foods")
sparse_bp <- default_recipe_blueprint(composition = "dgCMatrix")
text_rec <-
recipe(score ~ review, data = training_data) %>%
step_tokenize(review) %>%
step_stopwords(review) %>%
step_tokenfilter(review, max_tokens = 1e3) %>%
step_tfidf(review)
rf_model <- parsnip::rand_forest(trees = 100) %>%
set_engine("ranger",importance = "impurity") %>%
set_mode("classification")
wf_fat <-
workflow() %>%
add_recipe(text_rec) %>%
add_model(rf_model)
wf_sparse <-
workflow() %>%
add_recipe(text_rec, blueprint = sparse_bp) %>%
add_model(rf_model)
# fit works and...
fit_fat <- fit(wf_fat,training_data)
# predict works
summary(predict(fit_fat,training_data))
#> .pred_class
#> great:2609
#> other:1391
# fit works but...
fit_sparse <- fit(wf_sparse,training_data)
# predict gags
summary(predict(fit_sparse,training_data))
#> Error in as.data.frame.default(new_data): cannot coerce class 'structure("dgCMatrix", package = "Matrix")' to a data.frame
Created on 2023-03-31 with reprex v2.0.2
The dgCMatrix()
function returns a Matrix
object, which is of incompatible class type (non-S3). To coerce
library(Matrix)
(m <- Matrix(c(0,0,2:0), 3,5))
#> 3 x 5 sparse Matrix of class "dgCMatrix"
#>
#> [1,] . 1 . . 2
#> [2,] . . 2 . 1
#> [3,] 2 . 1 . .
as.data.frame(as.matrix(m))
#> V1 V2 V3 V4 V5
#> 1 0 1 0 0 2
#> 2 0 0 2 0 1
#> 3 2 0 1 0 0
Created on 2023-03-31 with reprex v2.0.2
but that defeats the purpose of using a sparse matrix
Yes, I realize that but I don't have a dgCMatrix, per se. I have a model object with a predict
method that should accommodate sparse matrices that are within the object.
How does this example differ?
library(hardhat)
library(recipes)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#>
#> step
train <- iris[1:100, ]
test <- iris[101:150, ]
bp <- default_recipe_blueprint(composition = "dgCMatrix")
rec <- recipe(Species ~ Sepal.Length + Sepal.Width, train) %>%
step_log(Sepal.Length)
processed <- mold(rec, train, blueprint = bp)
class(processed$predictors)
#> [1] "dgCMatrix"
#> attr(,"package")
#> [1] "Matrix"
as.data.frame(processed$predictors)
#> Error in as.data.frame.default(processed$predictors): cannot coerce class 'structure("dgCMatrix", package = "Matrix")' to a data.frame
I'm sorry if I'm not being clear or being dense (as opposed to sparse). I do understand how to convert a sparse matrix to a data frame but how do I get predict
to work on the output of the ranger
model workflow? Thank you.
As I understand, three engines can handle sparse matrices. I assume the methods for each of those engines can as well. predict
works with xgboost
, glmnet
but not ranger
as you can see below. Is there a way to fix this?
library(tidyverse)
library(tidymodels)
library(tidytext)
library(textrecipes)
library(stopwords)
library(hardhat)
data("small_fine_foods")
sparse_bp <- default_recipe_blueprint(composition = "dgCMatrix")
text_rec <-
recipe(score ~ review, data = training_data) %>%
step_tokenize(review) %>%
step_stopwords(review) %>%
step_tokenfilter(review, max_tokens = 1e3) %>%
step_tfidf(review)
xg_model <- parsnip::boost_tree(trees = 100) %>%
set_engine("xgboost") %>%
set_mode("classification")
las_model <- parsnip::logistic_reg(penalty = 0.02, mixture = 1) %>%
set_engine("glmnet")
rf_model <- parsnip::rand_forest(trees = 100) %>%
set_engine("ranger") %>%
set_mode("classification")
xg_wf <-
workflow() %>%
add_recipe(text_rec, blueprint = sparse_bp) %>%
add_model(xg_model)
las_wf <-
workflow() %>%
add_recipe(text_rec, blueprint = sparse_bp) %>%
add_model(las_model)
rf_wf <-
workflow() %>%
add_recipe(text_rec, blueprint = sparse_bp) %>%
add_model(rf_model)
fit_xg <- fit(xg_wf,training_data)
fit_rf <- fit(rf_wf,training_data)
fit_las <- fit(las_wf,training_data)
summary(predict(fit_xg,training_data))
#> .pred_class
#> great: 134
#> other:3866
summary(predict(fit_las,training_data))
#> .pred_class
#> great:3556
#> other: 444
summary(predict(fit_rf,training_data))
#> Error in as.data.frame.default(new_data): cannot coerce class 'structure("dgCMatrix", package = "Matrix")' to a data.frame
Created on 2023-04-01 with reprex v2.0.2
Ok. Got it now (maybe it’s me who’s dense. I’ll see if I can figure what ranger does differently with the same blueprint from the others.
Hello
This appears to be a bug! We will take a look it it
2 Likes
system
Closed
April 25, 2023, 12:53am
8
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.