What is the best way to train multiple models on the same data in Keras for hyperparameter tuning?

I would like to train multiple models on the same data using Keras, as an exercise for me to get acquainted with hyperparameter tuning in Keras for R (in Python, I use a different approach based on the Python library hyperopt). I was looking into the tfruns library

https://tensorflow.rstudio.com/tools/tfruns/articles/overview.html

and the training flags concept:

https://tensorflow.rstudio.com/tools/training_flags.html

There is one thing I don't understand. All the examples I've seen up to now, e.g.,


(and so on and so forth) seem to use either the training_run or the tuning_run function to run a monolithic script, with different values of the training flags (hyperparameter values). The script does everything (load data, preprocess them, compute results, etc.).

This seems a bit wasteful: if I want to test multiple models on the same data, surely it makes more sense to download the data from the Web, shuffle/split/normalize them in a separate script, and then run the model fitting script multiple times, rather than having to repeat the Data Preparation step for each fit. This is why, in my attempt at hyperparameter tuning, I wrote three different scripts: 1_preprocess_wine_data.R to prepare data, 2_train_and_evaluate_models.R to fit various models to the same data set, and fit_single_model.R to fit each single model, defined by a specific set of hyperparameters.

https://rstudio.cloud/project/160813

However, I'm stuck now, because neither training_run or tuning_run seem to allow passing any argument to the training script, except of course for the training flags. In particular, I cannot pass the training and validation sets to my fitting script! How can I solve this? I don't strictly have to use tfruns in order to perform hyperparameter tuning with keras, though it does have a few interesting options. Thus, I'm open to other suggestions which don't use tfruns and the training flags concept.

Hi Andrea,

I was going to ask if you could just preprocess the data once, and load them using readRDS?
But it seems you're already doing it...

If you want to have different datasets, how about storing the filenames as character flags?

1 Like

Hi, zkajdan,

thanks a lot for your interest in the matter! Indeed I postprocess the data just once, and I then load them. I do not have different datasets: I have just one, which I then split in x_train, y_train, x_val, y_val, x_test , y_test.

However, I would like to train different models on this one dataset. The problem is, how do I pass the dataset to the training script? Neither training_run nor tuning_run seem to allow passing arguments to the training script, since they don't have the ellipsis special argument ....

training_run(file = "train.R", context = "local",
  config = Sys.getenv("R_CONFIG_ACTIVE", unset = "default"), flags = NULL,
  properties = NULL, run_dir = NULL, echo = TRUE, view = "auto",
  envir = parent.frame(), encoding = getOption("encoding"))

However, your words gave me an epiphany:

I could just include the name of the .rda file in the training flags!

FLAGS <- flags(
  flag_string("dataset", "ws.rda", "UCI Wine Quality Data Set"),
  .
  .
  .
)

I'd still prefer to be able to pass arguments to the training script, but I guess this is what I have to do if I want to use the tfruns package and I don't want to preprocess the same data multiple times. I'll wait to see if someone else has other suggestions, otherwise I'll accept your answer.

yeah that's exactly what I meant :slight_smile:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.