Apologies for this long question, it's because I tried different scenarios for my inquiry and I'm trying to document them in the question.
The glmnet help pages states that glmnet supports any family object as used by stats:glm()
.
From version 4.0 onwards, glmnet supports both the original built-in families, as well as any family object as used by stats:glm(). This opens the door to a wide variety of additional models. For example family=binomial(link=cloglog) or family=negative.binomial(theta=1.5) (from the MASS library). Note that the code runs faster for the built-in families.
Referring to the help page of family {stats}
, I understand that using an external family is specified via a function, for example family = quasipoisson()
.
I don't know if using external families is supported in tidymodels or not. I tried to use a quasipoisson()
family with a poisson model, but I couldn't. Making the model specification and fitting works with no problems, the problems do occur when I use predict
.
- Using
predict()
without specifying a penalty makes predictions for all lambda values. The output is a long tibble (number of rows = no. of observations x no. of lambda values). - Using
predict()
with specifying a penalty gives an errorError: The ellipses are not used to pass args to the model function's predict function. These arguments cannot be used: 'penalty'
. - Using
multi_predict()
gives an errorError in class() : 0 arguments passed to 'class' which requires 1
Extracting the glmnet fitted model object and then using predict.glmnet()
method works with no problem. But this is not a suitable workaround, because I'm still unable to use the tidymodels tune functions, such as tune_grid()
and fit_resamples()
.
- Using
fit_resamples()
produces an error related to incorrect matrix dimensions. - Using
tune_grid()
works smoothly with no problems. - Using
tune_grid()
with a custom grid yields an error message:Error in class() : 0 arguments passed to 'class' which requires 1
Below is a reprex
library(tidymodels)
library(poissonreg)
Create some data to play with
n <- 2e3
set.seed(123)
x1 <- MASS::rnegbin(n, mu = 28, theta = 0.6)
x2 <- rnorm(n, 5, 10)
x3 <- as.factor(rnorm(n) < 0)
y <- as.integer(x1 + x2 + as.numeric(x3))
y[y < 0] <- 0
data <- tibble(y, x1, x2, x3)
Create a recipe, model spec and a workflow
# recipe
rec <- recipe(y ~ ., data = data) %>%
step_dummy(all_nominal()) %>%
step_normalize(all_predictors())
First, I try a spec for glmnet
with the default family
pois_spec <- poisson_reg() %>%
set_engine("glmnet")
wflow <- workflow() %>%
add_recipe(rec) %>%
add_model(pois_spec)
Fit the workflow
wflow_fit <- wflow %>%
fit(data = data)
Using the predict()
method
I should specify a penatly in a glmnet model
preds <- predict(wflow_fit, data, penalty = 0.5)
str(preds)
#> tibble [2,000 × 1] (S3: tbl_df/tbl/data.frame)
#> $ .pred: num [1:2000] 21.7 86.5 14 25 22.8 ...
As predict without a penalty gives an error
preds <- predict(wflow_fit, data)
#> Error: `penalty` should be a single numeric value. `multi_predict()` can be used to get multiple predictions per row of data.
Also multi_predict()
gives an error
preds <- multi_predict(wflow_fit, data, penalty = c(0.550, 0.604))
#> Error in class(): 0 arguments passed to 'class' which requires 1
Now I use a quasipoisson()
family
pois_spec <- poisson_reg() %>%
set_engine("glmnet", family = quasipoisson())
wflow <- workflow() %>%
add_recipe(rec) %>%
add_model(pois_spec)
And fit thw workflow
wflow_fit <- wflow %>%
fit(data = data)
predict()
works with no errors, but it produces unexpected output; a long tibble of
predictions at each lambda value
preds <- predict(wflow_fit, data)
str(preds)
#> tibble [122,000 × 2] (S3: tbl_df/tbl/data.frame)
#> $ .pred_values: num [1:122000] 32.9 32.9 32.9 32.9 32.9 ...
#> $ .pred_lambda: num [1:122000] 34.6 34.6 34.6 34.6 34.6 ...
preds %>% count(.pred_lambda)
#> # A tibble: 61 x 2
#> .pred_lambda n
#> * <dbl> <int>
#> 1 0.130 2000
#> 2 0.143 2000
#> 3 0.157 2000
#> 4 0.172 2000
#> 5 0.189 2000
#> 6 0.207 2000
#> 7 0.227 2000
#> 8 0.250 2000
#> 9 0.274 2000
#> 10 0.301 2000
#> # … with 51 more rows
And specifying a penalty
gives am error
preds <- predict(wflow_fit, data, penalty = 0.150)
#> Error: The ellipses are not used to pass args to the model function's predict function. These arguments cannot be used: `penalty`
Using multi_predict()
also gives an error, same as when used with default glmnet
preds <- multi_predict(wflow_fit, data, penalty = c(0.550, 0.604))
#> Error in class(): 0 arguments passed to 'class' which requires 1
Extract the fitted model and use glmnet predict method instead.
This works with no problems
model_fit <- wflow_fit %>%
chuck("fit", "fit", "fit")
class(model_fit)
#> [1] "glmnetfit" "glmnet"
data_X <- cbind(x1, x2, x3)
preds_glmnet <- predict(model_fit, data_X, s = 0.5)
str(preds_glmnet)
#> num [1:2000, 1] 7.96 59.87 5.59 18.02 7.02 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : NULL
#> ..$ : chr "1"
Now I’m using tune
functions
res <- vfold_cv(data, v = 3)
Use fit_resamples()
yields error of incompatible matrix dimensions
resamples <- wflow %>%
fit_resamples(res)
#> x Fold1: preprocessor 1/1, model 1/1 (predictions): Error: Problem with `mutate()...
#> x Fold2: preprocessor 1/1, model 1/1 (predictions): Error: Problem with `mutate()...
#> x Fold3: preprocessor 1/1, model 1/1 (predictions): Error: Problem with `mutate()...
#> Warning: All models failed. See the `.notes` column.
resamples %>%
chuck(".notes", 1, 1)
#> [1] "preprocessor 1/1, model 1/1 (predictions): Error: Problem with `mutate()` input `.row`.\nx Input `.row` can't be recycled to size 40020.\nℹ Input `.row` is `orig_rows`.\nℹ Input `.row` must be size 40020 or 1, not 667."
Using tune_grid()
.
No errors and completes smoothly
pois_spec <- poisson_reg(
penalty = tune(),
mixture = tune()) %>%
set_engine(
"glmnet",
family = quasipoisson())
wflow <- workflow() %>%
add_recipe(rec) %>%
add_model(pois_spec)
resamples <- wflow %>%
tune_grid(res)
resamples %>%
chuck(".notes", 1)
#> # A tibble: 0 x 1
#> # … with 1 variable: .notes <chr>
Using tune_grid()
with a custom grid
yields an error Error in class() : 0 arguments passed to 'class' which requires 1
param_grid <- grid_regular(
parameters(pois_spec),
levels = c(mixture = 2, penalty = 20))
resamples <- wflow %>%
tune_grid(
res,
grid = param_grid)
#> x Fold1: preprocessor 1/1, model 1/2 (predictions): Error in class(): 0 arguments...
#> x Fold1: preprocessor 1/1, model 2/2 (predictions): Error in class(): 0 arguments...
#> x Fold2: preprocessor 1/1, model 1/2 (predictions): Error in class(): 0 arguments...
#> x Fold2: preprocessor 1/1, model 2/2 (predictions): Error in class(): 0 arguments...
#> x Fold3: preprocessor 1/1, model 1/2 (predictions): Error in class(): 0 arguments...
#> x Fold3: preprocessor 1/1, model 2/2 (predictions): Error in class(): 0 arguments...
#> Warning: All models failed. See the `.notes` column.
resamples %>%
chuck(".notes", 1, 1)
#> [1] "preprocessor 1/1, model 1/2 (predictions): Error in class(): 0 arguments passed to 'class' which requires 1"
#> [2] "preprocessor 1/1, model 2/2 (predictions): Error in class(): 0 arguments passed to 'class' which requires 1"
Created on 2021-05-01 by the reprex package (v1.0.0)