Thanks both for your comments. I don't want to belabor the point, but there are three things I would like to add.
First of all, I think it's a lot easier to learn how to use new arguments for a function you're familiar with than it is to learn to use a new function. The arguments are listed in the function documentation the user is already familiar with, and they're approachable to the user as modifications to the operations that he knows. For example, I used caret for a long time until I found the index argument to trainControl, which is a simple and powerful way to control the cv folds. Once I found it though, its use was immediately obvious.
Second, I appreciate your assurance that caret will remain supported, but honestly as a user, it feels like it's been superseded, and it feels like the developers' effort has gone into creating a new framework instead of growing the methods available. It's been a few years since lightGBM was introduced and python programmers are using it to win every single kaggle competition, but we don't have a lightGBM method in base caret yet. And in general there are not a lot of options for neural networks with more than one hidden layer.
Finally, I think the tidyverse and caret already provide the functionality to fit models and create a workflow in an intuitive and customizable way. Here's a vignette of how I and my colleagues (at a chemical company) use tibbles and functional programming to fit any number of different models with any number of different methods. expand_grid to define all sorts of input settings and fit any number of models in a single pmap call and store them in a column. This is a minimal example, but it's not that much more to add train/test splits, pre-processing, bootstrapping etc. I'd argue in this example, the rich set of input arguments to trainControl and train actually empower functional programming. Those arguments work so well with the pmap!
Thanks again.
library(tidyverse)
library(caret)
#> Loading required package: lattice
#>
#> Attaching package: 'caret'
#> The following object is masked from 'package:purrr':
#>
#> lift
Many <-
# define tibble of trainControl and train settings
expand_grid(
# data sets
data = list(iris),
# formulas
form = list(Sepal.Length ~ Sepal.Width + Petal.Length,
Sepal.Width ~ Petal.Length + Sepal.Length),
# trainControl arguments
cvmethod = "repeatedcv",
number = 4,
# train arguments
method = c("glmnet", "rpart", "rf"),
tuneLength = 2,
) %>%
mutate(
# create trControl objects
trControl = select(., method = cvmethod, number) %>% pmap(trainControl)
) %>%
mutate(
# fit models
model = select(., form, data, method, tuneLength, trControl) %>% pmap(train)
) %>%
# add quality of fit on validation set
bind_cols(
map_dfr(.$model, ~left_join(.$bestTune, .$result, by = names(.$bestTune))
)
)
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
Many %>% select_if(negate(is.list)) %>% print()
#> # A tibble: 6 x 14
#> cvmethod number method tuneLength alpha lambda RMSE Rsquared MAE RMSESD
#> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 repeate~ 4 glmnet 2 1 3.10e-3 0.341 0.847 0.275 0.0223
#> 2 repeate~ 4 rpart 2 NA NA 0.494 0.647 0.397 0.0792
#> 3 repeate~ 4 rf 2 NA NA 0.335 0.843 0.269 0.0247
#> 4 repeate~ 4 glmnet 2 0.1 8.02e-4 0.329 0.446 0.259 0.0433
#> 5 repeate~ 4 rpart 2 NA NA 0.338 0.410 0.269 0.0822
#> 6 repeate~ 4 rf 2 NA NA 0.322 0.519 0.253 0.0370
#> # ... with 4 more variables: RsquaredSD <dbl>, MAESD <dbl>, cp <dbl>,
#> # mtry <dbl>
Created on 2021-05-27 by the reprex package (v1.0.0)