Tidymodels (Fitting Random Forest Models using the function fit_samples()): x Fold01: internal: Error: Must group by variables found in `.data`.

Kaikash777 · December 17, 2020, 7:46am

Overview

I have produced a random forest regression model and I would like to fit the model using the function fit_samples(). However, I am experiencing this error message:

Error Message:

    ! Fold01: model: tune columns were requested but there were 14 predictors in the data. 14 will be u...
   x Fold01: internal: Error: Must group by variables found in `.data`.
    * Column `mtry` is not found.

   ! Fold02: model: tune columns were requested but there were 14 predictors in the data. 14 will be u...
  x Fold02: internal: Error: Must group by variables found in `.data`.
  * Column `mtry` is not found.

   ! Fold03: model: tune columns were requested but there were 14 predictors in the data. 14 will be u...
   x Fold03: internal: Error: Must group by variables found in `.data`.
   * Column `mtry` is not found.

I have done an online search for a solution, but there appears to not be a question that aligns with my particular issue. I am not an advanced R user and I am trying my best to slowly manoeuvre myself through this Tidymodels package

If anyone can help with this error message, I would be deeply appreciative.

Many thanks in advance

R-code

       #Open libraries
       library(tidymodels)
       library(ranger)
       
       seed(45L)

       #split this single dataset into two: a training set and a testing set
       data_split <- initial_split(FID)
       #Create data frames for the two sets:
       train_data <- training(data_split)
       test_data  <- testing(data_split)

      #resample the data with 10-fold cross-validation (10-fold by default)
      cv <- vfold_cv(train_data, v=10)

     ###########################################################
     ##Produce the recipe

      rec <- recipe(Frequency ~ ., data = FID) %>% 
      step_nzv(all_predictors(), freq_cut = 0, unique_cut = 0) %>% # remove variables with zero variances
      step_novel(all_nominal()) %>% # prepares test data to handle previously unseen factor levels 
      step_medianimpute(all_numeric(), -all_outcomes(), -has_role("id vars"))  %>% # replaces missing numeric observations with the median
      step_dummy(all_nominal(), -has_role("id vars")) # dummy codes categorical variables

      #Produce the random forest model

           mod_rf <- rand_forest(
                                mtry = tune(),
                                trees = 1000,
                                min_n = tune()
                                 ) %>%
                               set_mode("regression") %>%
                               set_engine("ranger")  

       ##Workflow
          wflow_rf <- workflow() %>% 
                                add_model(mod_rf) %>% 
                                            add_recipe(rec)

        ##Fit model

         fit_rf<-fit_resamples(
                            wflow_rf,
                            cv,
                            metrics = metric_set(rmse, rsq),
                            control = control_resamples(save_pred = TRUE,
                            extract = function(x) extract_model(x)))

Data Frame FID

structure(list(Year = c(2015, 2015, 2015, 2015, 2015, 2015, 2015, 
2015, 2015, 2015, 2015, 2015, 2016, 2016, 2016, 2016, 2016, 2016, 
2016, 2016, 2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017, 2017, 
2017, 2017, 2017, 2017, 2017, 2017, 2017), Month = structure(c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 10L, 11L, 12L), .Label = c("January", "February", "March", 
"April", "May", "June", "July", "August", "September", "October", 
"November", "December"), class = "factor"), Frequency = c(36, 
28, 39, 46, 5, 0, 0, 22, 10, 15, 8, 33, 33, 29, 31, 23, 8, 9, 
7, 40, 41, 41, 30, 30, 44, 37, 41, 42, 20, 0, 7, 27, 35, 27, 
43, 38), Days = c(31, 28, 31, 30, 6, 0, 0, 29, 15, 
29, 29, 31, 31, 29, 30, 30, 7, 0, 7, 30, 30, 31, 30, 27, 31, 
28, 30, 30, 21, 0, 7, 26, 29, 27, 29, 29)), row.names = c(NA, 
-36L), class = "data.frame")

Max · December 18, 2020, 11:42pm

Could you please provide a reproducible example (as we discussed previously). There is no Tidy_df in your code and there is no Frequency_Blue column in FID.

We've also asked you to not use parallel processing in your example code (unless it runs sequentially but not in parallel).

We really want to help, but with so many people using these packages and asking questions, it would be helpful for you to minimize extra work on our part by making sure that your code will execute. This would not occur if you used the reprex package (as we've suggested).

Maybe start by reading "What is a reprex", and follow the advice further down that page.

Kaikash777 · December 19, 2020, 5:45am

Hi Max, I would like to sincerely apologise for the copying and pasting mistake. I have been working on a few different versions of my code to try and fix this issue, and I honestly copied the wrong version. I would like to thank you for your understanding, patience, time, and, guidance, and it was not my intention to irritate.

I have deleted the plan multisession command to prevent parallel processing, in addition to changing the object name of the data frame and the name of the independent value to the correct syntax.

Sorry, again.

system · January 9, 2021, 5:45am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.