partial least squares regression pls() default hyperparameters

I am trying to fit a partial least squares regression model using pls() in tidymodels with the default values for both hyperparameters.

Using the below code with the mtcars data without explicitly specifying any hyperparameters works fine:

set.seed(13)
options(scipen=999)

library(tidymodels)

library(plsmod)

data.mtcars <- mtcars

data.mtcars.split <- initial_split(data.mtcars,
                                   prop = 0.9)

data.mtcars.train <- training(data.mtcars.split)
data.mtcars.test <- testing(data.mtcars.split)

spec_untuned_pls <- pls() %>%
  set_engine("mixOmics") %>%
  set_mode("regression")

rec_untuned_pls <- recipe(mpg ~ ., data = data.mtcars.train) %>%
  step_center(all_predictors()) %>% # mean zero
  step_scale(all_predictors()) # standard deviation one

workflow_untuned_pls <- workflow() %>%
  add_recipe(rec_untuned_pls) %>% 
  add_model(spec_untuned_pls) %>%
  step_zv(all_predictors())

system.time(model_pls <- fit(workflow_untuned_pls,
                              data = data.mtcars.train))

predictions_train_pls <- augment(model_pls, data.mtcars.train)

However, when I use my own dataset, I receive an error after running

predictions_train_pls <- augment(model_pls, data.own.train)

Error in solve.default(t(Pmat[, 1:x]) %*% Wmat[, 1:x]) : **
** system is computationally singular: reciprocal condition number = 7.75887e-17

This error does not occur when I set the num_comp argument in pls() to any arbitrary number, including the default value which is 2 according to the documentation. Any idea what could be the issue here? Why would specifying the num_comp argument to its default value make any difference?

My second question, also related to pls(), is that in the documentation states "predictor_prop : Proportion of Predictors (type: double, default: see below)" but I don't see any further information regarding what the default value is. Could this be clarified?

Any help is much appreciated.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.