I am trying to fit a partial least squares regression model using pls() in tidymodels with the default values for both hyperparameters.
Using the below code with the mtcars data without explicitly specifying any hyperparameters works fine:
set.seed(13) options(scipen=999) library(tidymodels) library(plsmod) data.mtcars <- mtcars data.mtcars.split <- initial_split(data.mtcars, prop = 0.9) data.mtcars.train <- training(data.mtcars.split) data.mtcars.test <- testing(data.mtcars.split) spec_untuned_pls <- pls() %>% set_engine("mixOmics") %>% set_mode("regression") rec_untuned_pls <- recipe(mpg ~ ., data = data.mtcars.train) %>% step_center(all_predictors()) %>% # mean zero step_scale(all_predictors()) # standard deviation one workflow_untuned_pls <- workflow() %>% add_recipe(rec_untuned_pls) %>% add_model(spec_untuned_pls) %>% step_zv(all_predictors()) system.time(model_pls <- fit(workflow_untuned_pls, data = data.mtcars.train)) predictions_train_pls <- augment(model_pls, data.mtcars.train)
However, when I use my own dataset, I receive an error after running
predictions_train_pls <- augment(model_pls, data.own.train)
Error in solve.default(t(Pmat[, 1:x]) %*% Wmat[, 1:x]) : **
** system is computationally singular: reciprocal condition number = 7.75887e-17
This error does not occur when I set the num_comp argument in pls() to any arbitrary number, including the default value which is 2 according to the documentation. Any idea what could be the issue here? Why would specifying the num_comp argument to its default value make any difference?
My second question, also related to pls(), is that in the documentation states "
predictor_prop : Proportion of Predictors (type: double, default: see below)" but I don't see any further information regarding what the default value is. Could this be clarified?
Any help is much appreciated.