Using Tidymodels with Glmnet and Sparse Matrices

Hi,

I am using tidymodels to explore the small_fine_foods dataset where I'm just trying to replicate the analysis done in this blog post. It seems to be something small but the models all fail with:

Error: Columns (review) are not numeric; cannot convert to matrix

Below is a minimal example. Can anyone see where my error is?

library(recipes)
library(modeldata)
library(textrecipes)

data("small_fine_foods")
training_data

library(hardhat)
sparse_bp <- default_recipe_blueprint(composition = "dgCMatrix")

text_rec <-
  recipe(score ~ review, data = training_data) %>% 
  step_tokenize(review)

lasso_spec <-
  logistic_reg(penalty = 0.02, mixture = 1) %>%
  set_engine("glmnet")

wf_sparse <- 
  workflow() %>%
  add_recipe(text_rec, blueprint = sparse_bp) %>%
  add_model(lasso_spec)

food_folds <- vfold_cv(training_data, strata = score)

sparse = fit_resamples(wf_sparse, food_folds)

# Error: Columns (`review`) are not numeric; cannot convert to matrix

# The tokens appear to be present
text_rec %>% 
  prep() %>% 
  bake(new_data = NULL)

# # A tibble: 4,000 x 2
# review score
# <tknlist> <fct>
#   1  [13 tokens] other
# 2  [94 tokens] great
# 3 [104 tokens] great
# 4  [36 tokens] great
# 5  [19 tokens] great
# 6  [27 tokens] great
# 7  [83 tokens] other
# 8  [53 tokens] great
# 9  [55 tokens] great
# 10  [45 tokens] great
# # ... with 3,990 more rows

Hi,

Seems to make it work i add the step step_tokenfilter

Thanks

You'll have to use one of the steps that converts the text to numeric features. See docs for the the steps listed under Step Functions - tokenlist to numeric() or Step Functions - character to numeric.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.