Support for glmnet with an external family

Apologies for this long question, it's because I tried different scenarios for my inquiry and I'm trying to document them in the question.

The glmnet help pages states that glmnet supports any family object as used by stats:glm().

From version 4.0 onwards, glmnet supports both the original built-in families, as well as any family object as used by stats:glm(). This opens the door to a wide variety of additional models. For example family=binomial(link=cloglog) or family=negative.binomial(theta=1.5) (from the MASS library). Note that the code runs faster for the built-in families.

Referring to the help page of family {stats}, I understand that using an external family is specified via a function, for example family = quasipoisson().

I don't know if using external families is supported in tidymodels or not. I tried to use a quasipoisson() family with a poisson model, but I couldn't. Making the model specification and fitting works with no problems, the problems do occur when I use predict.

  • Using predict() without specifying a penalty makes predictions for all lambda values. The output is a long tibble (number of rows = no. of observations x no. of lambda values).
  • Using predict() with specifying a penalty gives an error Error: The ellipses are not used to pass args to the model function's predict function. These arguments cannot be used: 'penalty'.
  • Using multi_predict() gives an error Error in class() : 0 arguments passed to 'class' which requires 1

Extracting the glmnet fitted model object and then using predict.glmnet() method works with no problem. But this is not a suitable workaround, because I'm still unable to use the tidymodels tune functions, such as tune_grid() and fit_resamples().

  • Using fit_resamples() produces an error related to incorrect matrix dimensions.
  • Using tune_grid() works smoothly with no problems.
  • Using tune_grid() with a custom grid yields an error message: Error in class() : 0 arguments passed to 'class' which requires 1

Below is a reprex

library(tidymodels)
library(poissonreg)

Create some data to play with

n <- 2e3
set.seed(123)
x1 <- MASS::rnegbin(n, mu = 28, theta = 0.6)
x2 <- rnorm(n, 5, 10)
x3 <- as.factor(rnorm(n) < 0)
y <- as.integer(x1 + x2 + as.numeric(x3))
y[y < 0] <- 0
data <- tibble(y, x1, x2, x3)

Create a recipe, model spec and a workflow

# recipe
rec <- recipe(y ~ ., data = data) %>% 
  step_dummy(all_nominal()) %>% 
  step_normalize(all_predictors())

First, I try a spec for glmnet with the default family

pois_spec <- poisson_reg() %>% 
  set_engine("glmnet")

wflow <- workflow() %>% 
  add_recipe(rec) %>% 
  add_model(pois_spec)

Fit the workflow

wflow_fit <- wflow %>% 
  fit(data = data)

Using the predict() method
I should specify a penatly in a glmnet model

preds <- predict(wflow_fit, data, penalty = 0.5)
str(preds)
#> tibble [2,000 × 1] (S3: tbl_df/tbl/data.frame)
#>  $ .pred: num [1:2000] 21.7 86.5 14 25 22.8 ...

As predict without a penalty gives an error

preds <- predict(wflow_fit, data)
#> Error: `penalty` should be a single numeric value. `multi_predict()` can be used to get multiple predictions per row of data.

Also multi_predict() gives an error

preds <- multi_predict(wflow_fit, data, penalty = c(0.550, 0.604))
#> Error in class(): 0 arguments passed to 'class' which requires 1

Now I use a quasipoisson() family

pois_spec <- poisson_reg() %>% 
  set_engine("glmnet", family = quasipoisson())

wflow <- workflow() %>% 
  add_recipe(rec) %>% 
  add_model(pois_spec)

And fit thw workflow

wflow_fit <- wflow %>% 
  fit(data = data)

predict() works with no errors, but it produces unexpected output; a long tibble of
predictions at each lambda value

preds <- predict(wflow_fit, data)
str(preds)
#> tibble [122,000 × 2] (S3: tbl_df/tbl/data.frame)
#>  $ .pred_values: num [1:122000] 32.9 32.9 32.9 32.9 32.9 ...
#>  $ .pred_lambda: num [1:122000] 34.6 34.6 34.6 34.6 34.6 ...
preds %>% count(.pred_lambda)
#> # A tibble: 61 x 2
#>    .pred_lambda     n
#>  *        <dbl> <int>
#>  1        0.130  2000
#>  2        0.143  2000
#>  3        0.157  2000
#>  4        0.172  2000
#>  5        0.189  2000
#>  6        0.207  2000
#>  7        0.227  2000
#>  8        0.250  2000
#>  9        0.274  2000
#> 10        0.301  2000
#> # … with 51 more rows

And specifying a penalty gives am error

preds <- predict(wflow_fit, data, penalty = 0.150)
#> Error: The ellipses are not used to pass args to the model function's predict function. These arguments cannot be used: `penalty`

Using multi_predict() also gives an error, same as when used with default glmnet

preds <- multi_predict(wflow_fit, data, penalty = c(0.550, 0.604))
#> Error in class(): 0 arguments passed to 'class' which requires 1

Extract the fitted model and use glmnet predict method instead.

This works with no problems

model_fit <- wflow_fit %>% 
  chuck("fit", "fit", "fit")
class(model_fit)
#> [1] "glmnetfit" "glmnet"
data_X <- cbind(x1, x2, x3)
preds_glmnet <- predict(model_fit, data_X, s = 0.5)
str(preds_glmnet)
#>  num [1:2000, 1] 7.96 59.87 5.59 18.02 7.02 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : NULL
#>   ..$ : chr "1"

Now I’m using tune functions

res <- vfold_cv(data, v = 3)

Use fit_resamples()
yields error of incompatible matrix dimensions

resamples <- wflow %>% 
  fit_resamples(res)
#> x Fold1: preprocessor 1/1, model 1/1 (predictions): Error: Problem with `mutate()...
#> x Fold2: preprocessor 1/1, model 1/1 (predictions): Error: Problem with `mutate()...
#> x Fold3: preprocessor 1/1, model 1/1 (predictions): Error: Problem with `mutate()...
#> Warning: All models failed. See the `.notes` column.
resamples %>% 
  chuck(".notes", 1, 1)
#> [1] "preprocessor 1/1, model 1/1 (predictions): Error: Problem with `mutate()` input `.row`.\nx Input `.row` can't be recycled to size 40020.\nℹ Input `.row` is `orig_rows`.\nℹ Input `.row` must be size 40020 or 1, not 667."

Using tune_grid().
No errors and completes smoothly

pois_spec <- poisson_reg(
  penalty = tune(),
  mixture = tune()) %>% 
  set_engine(
    "glmnet", 
    family = quasipoisson())
wflow <- workflow() %>% 
  add_recipe(rec) %>% 
  add_model(pois_spec)
resamples <- wflow %>% 
  tune_grid(res)
resamples %>% 
  chuck(".notes", 1)
#> # A tibble: 0 x 1
#> # … with 1 variable: .notes <chr>

Using tune_grid() with a custom grid
yields an error Error in class() : 0 arguments passed to 'class' which requires 1

param_grid <- grid_regular(
  parameters(pois_spec),
  levels = c(mixture = 2, penalty = 20))
resamples <- wflow %>% 
  tune_grid(
    res,
    grid = param_grid)
#> x Fold1: preprocessor 1/1, model 1/2 (predictions): Error in class(): 0 arguments...
#> x Fold1: preprocessor 1/1, model 2/2 (predictions): Error in class(): 0 arguments...
#> x Fold2: preprocessor 1/1, model 1/2 (predictions): Error in class(): 0 arguments...
#> x Fold2: preprocessor 1/1, model 2/2 (predictions): Error in class(): 0 arguments...
#> x Fold3: preprocessor 1/1, model 1/2 (predictions): Error in class(): 0 arguments...
#> x Fold3: preprocessor 1/1, model 2/2 (predictions): Error in class(): 0 arguments...
#> Warning: All models failed. See the `.notes` column.
resamples %>% 
  chuck(".notes", 1, 1)
#> [1] "preprocessor 1/1, model 1/2 (predictions): Error in class(): 0 arguments passed to 'class' which requires 1"
#> [2] "preprocessor 1/1, model 2/2 (predictions): Error in class(): 0 arguments passed to 'class' which requires 1"

Created on 2021-05-01 by the reprex package (v1.0.0)

1 Like

Wow, there are a bunch of bugs that were exposed by this. :frowning_face:

To start, when calling a glmnet model through parsnip, you should have gotten an error if penalty did not have a value. Even though that model does not require a value at the time of fit, parsnip does. We are already in the process of documenting this better but a discussion about this can be found here.

When calling predict(), you don't need to add the penalty argument (it is ignored; we should check that). It automatically uses the one specified in poisson_reg().

For Poisson regression, we don't (currently) have a multi_predict() method. The error message about Error in class() is a bug; it should say that "No multi_predict method exists for objects with classes ...". I think that I can add that method.

About the quasipoisson family... we can try to support that. It's in the grey aread for bug or not. Technically, it is not a Poisson model anymore since it doesn't use the regular Poisson distribution. That might sound like I'm splitting hairs but it matters for the R bits:

> poisson_reg(penalty = 0.5) %>% 
+     set_engine("glmnet") %>% 
+     fit(y ~ x1 + x2, data = data) %>% 
+     pluck("fit") %>% 
+     class()
[1] "fishnet" "glmnet" 
> 
> # 'fishnet' is their class of Poisson models
> 
> poisson_reg(penalty = 0.5) %>% 
+     set_engine("glmnet", family = quasipoisson()) %>% 
+     fit(y ~ x1 + x2, data = data) %>% 
+     pluck("fit") %>% 
+     class()
[1] "glmnetfit" "glmnet"   
> 
> # Now the class is not Poisson specific, so our methods don't know what to do

I would be happy to support it but that will take some time to figure out if we can. You can track these at:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.