predict not working with ranger model when using sparse data

I am working off Julia Silge's blog post demonstrating sparse matrix models but I am using the ranger model for classification, rather than lasso which she uses. The model works fine when using non-sparse data but predict fails with sparse data complaining "cannot coerce class 'structure("dgCMatrix", package = "Matrix")' to a data.frame." How can this be? Thanks.

library(tidyverse)
library(tidymodels)
library(tidytext)
library(textrecipes)
library(stopwords)
library(hardhat)

data("small_fine_foods")

sparse_bp <- default_recipe_blueprint(composition = "dgCMatrix")

text_rec <-
  recipe(score ~ review, data = training_data) %>%
  step_tokenize(review)  %>%
  step_stopwords(review) %>%
  step_tokenfilter(review, max_tokens = 1e3) %>%
  step_tfidf(review)

rf_model <- parsnip::rand_forest(trees = 100) %>% 
  set_engine("ranger",importance = "impurity") %>% 
  set_mode("classification")

wf_fat <-
  workflow() %>%
  add_recipe(text_rec) %>%
  add_model(rf_model)

wf_sparse <- 
  workflow() %>%
  add_recipe(text_rec, blueprint = sparse_bp) %>%
  add_model(rf_model)

# fit works and...
fit_fat <- fit(wf_fat,training_data)
# predict works
summary(predict(fit_fat,training_data))
#>  .pred_class 
#>  great:2609  
#>  other:1391

# fit works but...
fit_sparse <- fit(wf_sparse,training_data)
# predict gags
summary(predict(fit_sparse,training_data))
#> Error in as.data.frame.default(new_data): cannot coerce class 'structure("dgCMatrix", package = "Matrix")' to a data.frame

Created on 2023-03-31 with reprex v2.0.2

The dgCMatrix() function returns a Matrix object, which is of incompatible class type (non-S3). To coerce

library(Matrix)
(m <- Matrix(c(0,0,2:0), 3,5))
#> 3 x 5 sparse Matrix of class "dgCMatrix"
#>               
#> [1,] . 1 . . 2
#> [2,] . . 2 . 1
#> [3,] 2 . 1 . .
as.data.frame(as.matrix(m))
#>   V1 V2 V3 V4 V5
#> 1  0  1  0  0  2
#> 2  0  0  2  0  1
#> 3  2  0  1  0  0

Created on 2023-03-31 with reprex v2.0.2

but that defeats the purpose of using a sparse matrix

Yes, I realize that but I don't have a dgCMatrix, per se. I have a model object with a predict method that should accommodate sparse matrices that are within the object.

How does this example differ?

library(hardhat)
library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
train <- iris[1:100, ]
test <- iris[101:150, ]
bp <- default_recipe_blueprint(composition = "dgCMatrix")
rec <- recipe(Species ~ Sepal.Length + Sepal.Width, train) %>%
  step_log(Sepal.Length)
processed <- mold(rec, train, blueprint = bp)
class(processed$predictors)
#> [1] "dgCMatrix"
#> attr(,"package")
#> [1] "Matrix"
as.data.frame(processed$predictors)
#> Error in as.data.frame.default(processed$predictors): cannot coerce class 'structure("dgCMatrix", package = "Matrix")' to a data.frame

I'm sorry if I'm not being clear or being dense (as opposed to sparse). I do understand how to convert a sparse matrix to a data frame but how do I get predict to work on the output of the ranger model workflow? Thank you.

As I understand, three engines can handle sparse matrices. I assume the methods for each of those engines can as well. predict works with xgboost, glmnet but not ranger as you can see below. Is there a way to fix this?

library(tidyverse)
library(tidymodels)
library(tidytext)
library(textrecipes)
library(stopwords)
library(hardhat)

data("small_fine_foods")

sparse_bp <- default_recipe_blueprint(composition = "dgCMatrix")

text_rec <-
  recipe(score ~ review, data = training_data) %>%
  step_tokenize(review)  %>%
  step_stopwords(review) %>%
  step_tokenfilter(review, max_tokens = 1e3) %>%
  step_tfidf(review)

xg_model <- parsnip::boost_tree(trees = 100) %>% 
  set_engine("xgboost") %>% 
  set_mode("classification")

las_model <- parsnip::logistic_reg(penalty = 0.02, mixture = 1) %>% 
  set_engine("glmnet")

rf_model <- parsnip::rand_forest(trees = 100) %>% 
  set_engine("ranger") %>% 
  set_mode("classification")

xg_wf <- 
  workflow() %>%
  add_recipe(text_rec, blueprint = sparse_bp) %>%
  add_model(xg_model)

las_wf <- 
  workflow() %>%
  add_recipe(text_rec, blueprint = sparse_bp) %>%
  add_model(las_model)

rf_wf <- 
  workflow() %>%
  add_recipe(text_rec, blueprint = sparse_bp) %>%
  add_model(rf_model)

fit_xg <- fit(xg_wf,training_data)
fit_rf <- fit(rf_wf,training_data)
fit_las <- fit(las_wf,training_data)


summary(predict(fit_xg,training_data))
#>  .pred_class 
#>  great: 134  
#>  other:3866
summary(predict(fit_las,training_data))
#>  .pred_class 
#>  great:3556  
#>  other: 444
summary(predict(fit_rf,training_data))
#> Error in as.data.frame.default(new_data): cannot coerce class 'structure("dgCMatrix", package = "Matrix")' to a data.frame

Created on 2023-04-01 with reprex v2.0.2

Ok. Got it now (maybe it’s me who’s dense. I’ll see if I can figure what ranger does differently with the same blueprint from the others.

Hello :wave:

This appears to be a bug! We will take a look it it

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.