Examples of non-linear optimization with dials and parsnip help

I have been changing the way I model to the tidymodels way. One difficulty I am having is that I have not seen much documentation on how to use this framework for hyperparameter tuning.

One tutorial I like is Max Kuhn's Optimization Methods for Tuning Predictive Models, but this tutorial uses caret.

Also, the dials documentation goes over building grids of parameters, but this does not really go into detail on non-linear optimization like the above tutorial.

I have some starter code for Nelder-Mead optimization below, but it only hits one set of parameters.

How can I use the tidymodels framework and do optimization?

# load libs and data
library(caret)
library(tidymodels)
data(Sacramento)
str(Sacramento)
#> 'data.frame':    932 obs. of  9 variables:
#>  $ city     : Factor w/ 37 levels "ANTELOPE","AUBURN",..: 34 34 34 34 34 34 34 34 29 31 ...
#>  $ zip      : Factor w/ 68 levels "z95603","z95608",..: 64 52 44 44 53 65 66 49 24 25 ...
#>  $ beds     : int  2 3 2 2 2 3 3 3 2 3 ...
#>  $ baths    : num  1 1 1 1 1 1 2 1 2 2 ...
#>  $ sqft     : int  836 1167 796 852 797 1122 1104 1177 941 1146 ...
#>  $ type     : Factor w/ 3 levels "Condo","Multi_Family",..: 3 3 3 3 3 1 3 3 1 3 ...
#>  $ price    : int  59222 68212 68880 69307 81900 89921 90895 91002 94905 98937 ...
#>  $ latitude : num  38.6 38.5 38.6 38.6 38.5 ...
#>  $ longitude: num  -121 -121 -121 -121 -121 ...

# make objective function
svm_obj <- function(param){
  
## Create split of data
set.seed(955)
ctrl <- rsample::vfold_cv(Sacramento, v = 10, repeats = 1, strata = NULL)
  
# define SVM model object
get_svm_model <- function(data){
  svm_rbf(mode = "regression", cost = 10^(param[2]), rbf_sigma = 10^(param[2]))%>%
  set_engine("kernlab") %>%
  fit(price ~ ., data = data)
  }

get_preds <- function(asses_dat ,model_obj, ...) {
  price <- asses_dat %>% select(price)
  pred <- predict(model_obj, new_data = asses_dat %>% select(-price))
  cbind(price, pred)
}

asses_model <- ctrl %>% 
  mutate(asses = map(.$splits, assessment)) %>% 
  mutate(anal = map(.$splits, analysis)) %>% 
  mutate(svm_mod = map(anal, get_svm_model)) %>% 
  mutate(svm_pred = map2(asses, svm_mod, get_preds))

map_df(asses_model$svm_pred, yardstick::mae, price, .pred) %>% 
  summarise(mean(.estimate)) %>% 
  pull()
}

## Nelder-Mead 
set.seed(45642)
num_mods <- 3
nm_res <- optim(par = c(0, 0), fn = svm_obj, method = "Nelder-Mead",
                control = list(maxit = num_mods))

nm_res$par
#> [1] 0 0
nm_res$value
#> [1] 83742.79

Created on 2018-12-09 by the reprex package (v0.2.1)

2 Likes

Great question, I learned a lot looking. These tools look to be in the early stages of development and you may find some enlightenment in the code, which I haven't tried yet. What I did do seems promising, which is to replace the CRAN version with the development version

require(devtools)
install_github("topepo/parsnip")
??parsnip

will get you into the documentation, where you'll be able to see the available models, boost_tree, random_tree, etc. that have pre-built specifications. If the user has an exotic model, then you have to be able to figure out how to construct a model object.

https://tidymodels.github.io/dials/articles/articles/Dials_and_Parsnip.html provides a generic roadmap and sample code for boost_tree models of using dials in the situation where a min(x) of a training set is unknown at the time of building the cross-validator and using it to modify the specification as a placeholder.

As Kuhn warns in the tutorial you mentioned, grid searches can be extraordinarily expensive, so it seems like it would be a good idea to test with small datasets first and then decide whether to rent some heavy-duty cloud capacity to run on your full set.

So, I haven't seen any tutorials, but I think I'd be able to get there through the documentation.

1 Like

tidymodels isn't quite there yet in terms of an integrated solution (but it is coming in 2019). Here's some code to do the optimization. In the code below, use the latest version of recipes.

dials doesn't really play into this type of optimization unless you want to keep the parameters inside of a specific range (notes on how to do that below).

I'm a huge fan of Nelder-Mead, but it might not be a great solution here since it spends a lot of runs in the terminal phase of the optimization. Simulated annealing might work better. Either way, set a maximum number of runs that you are willing to do.

Also, this approach is probably better than random search when there are a lot of tuning parameters. YMMV.

Also also, you'll probably need more than one rep of CV for these data. I bumped to five.

library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
library(tidymodels)
#> ── Attaching packages ─────────────────────────────────────────────────────────── tidymodels 0.0.2 ──
#> ✔ broom     0.5.0          ✔ purrr     0.2.5     
#> ✔ dials     0.0.1.9000     ✔ recipes   0.1.4     
#> ✔ dplyr     0.7.8          ✔ rsample   0.0.3     
#> ✔ infer     0.4.0          ✔ tibble    1.4.2     
#> ✔ parsnip   0.0.1          ✔ yardstick 0.0.2
#> ── Conflicts ────────────────────────────────────────────────────────────── tidymodels_conflicts() ──
#> ✖ purrr::discard()       masks scales::discard()
#> ✖ rsample::fill()        masks tidyr::fill()
#> ✖ dplyr::filter()        masks stats::filter()
#> ✖ dplyr::lag()           masks stats::lag()
#> ✖ purrr::lift()          masks caret::lift()
#> ✖ yardstick::precision() masks caret::precision()
#> ✖ yardstick::recall()    masks caret::recall()
#> ✖ recipes::step()        masks stats::step()
#> ✖ yardstick::tidy()      masks rsample::tidy(), recipes::tidy(), broom::tidy()
data(Sacramento)

Sacramento <- 
  Sacramento %>% 
  dplyr::select(-zip)

# Setup the preprocessing

preproc <- recipe(price ~ ., data = Sacramento) %>%
  step_log(price) %>% 
  # City has a lot of levels and some with small freq;
  # Collapse some into an "other" city
  step_other(city) %>%
  step_dummy(all_nominal()) %>% 
  # In case we still have a column of all zeros
  step_zv(all_predictors()) %>% 
  step_center(all_predictors()) %>% 
  step_scale(all_predictors())

## Create split of data and precompute the preprocessing parts
set.seed(955)
ctrl <- 
  vfold_cv(Sacramento, v = 10, repeats = 5, strata = NULL) %>% 
  mutate(recipes = map(splits, prepper, preproc))

# Run the model, predict the assessment set, and return MAE
get_svm_mae <- function(split, recipe, model, ...){
  holdout_data <- 
    bake(recipe, new_data = assessment(split))
  model %>%
    fit(price ~ ., data = juice(recipe)) %>%
    predict(new_data = holdout_data) %>%
    bind_cols(holdout_data) %>%
    mae(price, .pred) %>% 
    pull(".estimate")
}

# make objective function
svm_obj <- function(param){
  # You may want to put in some kind of constraints here so that if an 
  # element of `param` is outside of a suitable range, just return a
  # larger number like 10^5.
  
  svm_mod <- 
    svm_rbf(mode = "regression", cost = 10^(param[2]), rbf_sigma = 10^(param[2]))%>%
    set_engine("kernlab")
  
  # You could also use furrr::future_map2_dbl for running in parallel here
  results <- 
    ctrl %>% 
    mutate(mae = map2_dbl(splits, recipes, get_svm_mae, model = svm_mod)) %>%
    summarize(estimate = mean(mae)) %>%
    pull("estimate")
  # cat("log-C", sprintf("%+2.4f", param[1]), 
  #     "log-s", sprintf("%+2.4f", param[2]), 
  #     "mae",   sprintf("%2.6f", results), "\n")
  results
}

# cost = 100, and sigma = 1 are probably suboptimal choices
optim(c(2, 0), svm_obj, method = "Nelder-Mead", control = list(maxit = 50))
#> $par
#> [1]  2.4138794 -0.1871338
#> 
#> $value
#> [1] 0.2492046
#> 
#> $counts
#> function gradient 
#>       51       NA 
#> 
#> $convergence
#> [1] 1
#> 
#> $message
#> NULL

Created on 2018-12-09 by the reprex package (v0.2.1)

5 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.