Ensure balanced mini batches while training



Dear all,

Using Keras for R, I am working with an imbalanced binary class data set for classification, with ~90% negative examples and ~10% positive examples and a batch size of 20 when training.

I am interested in ensuring, that each batch used for back-propagating is balanced, such that ~10 data points are sampled from the positive training data and ~10 from the negative. Thereby avoiding that the model is biased towards negative data.

I have been unable to find RStudio/Keras documentation on how to do this? (There is class_weight for fit(), but I am uncertain if this would achieve my objective)

Thanks in advance!


Ok, so solved this myself, with a little help from my friends. The trick is to use a custom
generator function, e.g. like so:

balanced_generator = function(X_data, Y_data, batch_size){
    i_0 = sample(x = which(Y_data == 0), size = batch_size / 2, replace = TRUE)
    i_1 = sample(x = which(Y_data == 1), size = batch_size / 2, replace = TRUE)
    i   = c(rbind(i_0, i_1))
    list(X_data[i,], Y_data[i])

and then train the network, using the fit_generator() function