What is the best way to train multiple models on the same data in Keras for hyperparameter tuning?

Andrea · December 31, 2018, 10:38am

I would like to train multiple models on the same data using Keras, as an exercise for me to get acquainted with hyperparameter tuning in Keras for R (in Python, I use a different approach based on the Python library hyperopt). I was looking into the tfruns library

https://tensorflow.rstudio.com/tools/tfruns/articles/overview.html

and the training flags concept:

https://tensorflow.rstudio.com/tools/training_flags.html

There is one thing I don't understand. All the examples I've seen up to now, e.g.,

github.com

rstudio/tfruns/blob/master/inst/examples/mnist_mlp/mnist_mlp.R

#' Trains a simple deep NN on the MNIST dataset.
#'
#' Gets to 98.40% test accuracy after 20 epochs (there is *a lot* of margin for
#' parameter tuning).
#'

library(keras)

# Hyperparameter flags ---------------------------------------------------

FLAGS <- flags(
  flag_numeric("dropout1", 0.4),
  flag_numeric("dropout2", 0.3)
)

# Data Preparation ---------------------------------------------------

# The data, shuffled and split between train and test sets
mnist <- dataset_mnist()
x_train <- mnist$train$x

This file has been truncated. show original

github.com

rstudio/keras/blob/master/vignettes/examples/quora_siamese_lstm.R

#' In this tutorial we will use Keras to classify duplicated questions from Quora.
#' The dataset first appeared in the Kaggle competition 
#' [*Quora Question Pairs*](https://www.kaggle.com/c/quora-question-pairs).
#' The dataset consists of ~400k pairs of questions and a column indicating 
#' if the question pair is duplicated. 
#' 
#' Our implementation is inspired by the Siamese Recurrent Architecture, Mueller et al. [*Siamese recurrent architectures for learning sentence similarity*](https://dl.acm.org/citation.cfm?id=3016291), with small modifications like the similarity
#' measure and the embedding layers (The original paper uses pre-trained word vectors). Using this kind
#' of architecture dates back to 2005 with [Le Cun et al](https://dl.acm.org/citation.cfm?id=1068961) and is usefull for
#' verification tasks. The idea is to learn a function that maps input patterns into a
#' target space such that a similarity measure in the target space approximates
#' the “semantic” distance in the input space. 
#' 
#' After the competition, Quora also described their approach to this problem in 
#' this [blog post](https://engineering.quora.com/Semantic-Question-Matching-with-Deep-Learning).
#' 

library(readr)
library(keras)
library(purrr)

This file has been truncated. show original

(and so on and so forth) seem to use either the training_run or the tuning_run function to run a monolithic script, with different values of the training flags (hyperparameter values). The script does everything (load data, preprocess them, compute results, etc.).

This seems a bit wasteful: if I want to test multiple models on the same data, surely it makes more sense to download the data from the Web, shuffle/split/normalize them in a separate script, and then run the model fitting script multiple times, rather than having to repeat the Data Preparation step for each fit. This is why, in my attempt at hyperparameter tuning, I wrote three different scripts: 1_preprocess_wine_data.R to prepare data, 2_train_and_evaluate_models.R to fit various models to the same data set, and fit_single_model.R to fit each single model, defined by a specific set of hyperparameters.

https://rstudio.cloud/project/160813

However, I'm stuck now, because neither training_run or tuning_run seem to allow passing any argument to the training script, except of course for the training flags. In particular, I cannot pass the training and validation sets to my fitting script! How can I solve this? I don't strictly have to use tfruns in order to perform hyperparameter tuning with keras, though it does have a few interesting options. Thus, I'm open to other suggestions which don't use tfruns and the training flags concept.

zkajdan · January 2, 2019, 8:04am

Hi Andrea,

I was going to ask if you could just preprocess the data once, and load them using readRDS?
But it seems you're already doing it...

If you want to have different datasets, how about storing the filenames as character flags?

Andrea · January 2, 2019, 10:25am

Hi, zkajdan,

thanks a lot for your interest in the matter! Indeed I postprocess the data just once, and I then load them. I do not have different datasets: I have just one, which I then split in x_train, y_train, x_val, y_val, x_test , y_test.

However, I would like to train different models on this one dataset. The problem is, how do I pass the dataset to the training script? Neither training_run nor tuning_run seem to allow passing arguments to the training script, since they don't have the ellipsis special argument ....

training_run(file = "train.R", context = "local",
  config = Sys.getenv("R_CONFIG_ACTIVE", unset = "default"), flags = NULL,
  properties = NULL, run_dir = NULL, echo = TRUE, view = "auto",
  envir = parent.frame(), encoding = getOption("encoding"))

However, your words gave me an epiphany:

I could just include the name of the .rda file in the training flags!

FLAGS <- flags(
  flag_string("dataset", "ws.rda", "UCI Wine Quality Data Set"),
  .
  .
  .
)

I'd still prefer to be able to pass arguments to the training script, but I guess this is what I have to do if I want to use the tfruns package and I don't want to preprocess the same data multiple times. I'll wait to see if someone else has other suggestions, otherwise I'll accept your answer.

zkajdan · January 2, 2019, 11:32am

yeah that's exactly what I meant

Andrea · January 9, 2019, 2:09pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.