debugging keras

I'm completely new to keras. How can I debug it? Where do I start?

It seems that reticulate is central. What kind of object is it? To the scope resolution operator ::, it acts like a package; e.g., reticulate::py_call retrieves the definition of the function, much like keras::pop_layer. But while ls("package:keras") works fine, ls("package:reticulate") fails:

Error in as.environment(pos) :
no item called "package:reticulate" on the search list

Attempting to mtrace it hits a similar error:

> mtrace(reticulate::py_call)
Error in as.environment("package:" %&% char.fname[2]) : 
  no item called "package:reticulate" on the search list

Not sure why, but I can overcome that with the triple colon:

> mtrace(reticulate:::py_call)

It turns out that keras doesn't actually use py_call, so I need to mtrace py_call_impl. But the value is limited. It gets called with an object x, which acts kind of like an environment, but not really; as.list and $ are broken:

D(6)> x
<class 'keras.callbacks.EarlyStopping'>

D(6)> class(x)
[1] "python.builtin.type"   "python.builtin.object"

D(6)> unclass(x)
Error in unclass(x) : cannot unclass an environment

D(6)> ls(x)
[1] "convert" "pyobj"  

D(6)> as.list(x)
Error in as.vector(x, "list") : 
  cannot coerce type 'environment' to vector of type 'list'

D(6)> lapply(ls(x), get, x)
[1] TRUE

<pointer: 0x13362980>

Not sure what to do with a pointer. BTW, the error is inaccurate; you can coerce an R environment to a list.

Anyway, ultimately I'd like to step across the language boundary to execute one Python statement at a time, inspect variables, etc. I cannot hope to achieve that with the R interactive debugger. What approaches did you find fruitful?

A clarification: do you want to debug a keras model (then you don’t need reticulate at all), or do you want to debug the keras framework? In the second case, since keras is a Python Open Source project, it’s much better if you learn Python and you make PRs on the GitHub repository, so that all keras users can benefit from your debugging. Of course I don't expect you to learn a language if you don't have the time or interest to do that, but I don't think you can efficiently debug a Python framework without knowing Python.

If you just need to debug a keras model, then Python and reticulate are not needed.

Thanks Andrea. I don't know enough to tell if I want to debug a model or the framework. I suspect the latter, but could use some help with that distinction.

I'm using the R package keras. There needs to be some glue between R and Python, and I suspected that reticulate was that glue. It definitely gets called.

Re learning Python -- I've used pdb in Python-only programs. But how do I get the debugger to launch when the control originates in R? And I don't know the "guts" of Python, i.e., C-level data structures etc.

1 Like

BTW, if I made a copy of the Python module keras in my home directory (=folder), and could get Python to load it from there, then maybe I could stick "pdb" in the source at a few functions, e.g., fit. I'm struggling with the second step, getting Python look outside the box. It seems that Sys.setenv(PYTHONPATH="...") has no effect. Maybe I need to put keras in a python virtualenv?

So how do you folks debug? Is there a way to use the Python debugger with the R package keras?

Ok, let’s take a step back. Can you show a reproducible example of a case you want to debug?

1 Like

Sure. Constructing a reprex will take me some time.

1 Like

That was a struggle. Here goes:
keras-0.pdf (8.1 KB)
I had to rename it ".pdf" because only accepts files with a few specific endings. But it is the output of reprex(input="keras-0.R"). BTW, reprex complained:

Preparing reprex as .R file:
  * keras-0_reprex.R
Rendering reprex... 
Error: pandoc version 1.12.3 or higher is required and was not found (see the help page ?rmarkdown::pandoc_available).

Indeed sudo apt-get install pandoc gives 1.12.2.

Anyway, back to my original question: how do you debug keras? Is there a way to make it launch pdb?


the pdf doesn't exist, but anyway you really don't need to attach a pdf with your reprex. The goal of the reprex is to make it easy & quick for us to just copy code straight from your question, and run it straight away in our R sessions. So it's better if you simply Cmd + v (or Ctrl + v, if you have Windows or Ubuntu) the output of reprex::reprex() in your question. For a nice guide to building reprexes, see

OK, I'm pasting it below. The FAQ you quote talks about "the error". So I could postulate the error is that pdb doesn't start up. Or that $history is all NaN.

#' ---
#' output:
#'   md_document:
#'     pandoc_args:
#'       - '--from=markdown-implicit_figures'
#'       - '--to=commonmark'
#'       - '--no-wrap'
#' ---

#+ reprex-setup, include = FALSE
options(tidyverse.quiet = TRUE)
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", error = TRUE)
knitr::opts_knit$set( = knitr::imgur_upload)

#+ reprex-body

generator.demo.x.y <- function(X, Y,
                               shuffle = TRUE, 
                               batch_size = 32,
                                 "bump.min.index", "wrap.around", "stop"))
   uneven.mode <- match.arg(uneven.mode)
   uneven.count <- (max_index-min_index+1) %% batch_size
   if(uneven.count!=0) {
             "stop"=stop("data size is not an even multiple of batch_size ",
                         "but uneven.mode is set to stop?"),
                txt <- paste("data size is not an even multiple of batch_size;",
                             "changing min_index from ", min_index, " to ",
                             min_index+uneven.count, ".\n", sep="")
                cat("Notice, data generator setup: ", txt, "\n", sep="")
                min_index <- min_index + uneven.count
             "" = "continue on, handled.later",
             "wrap.around"="continue on, handled.later",
             stop("unexpected uneven.mode"))
   i.gen.x.y <- min_index
   function() {
      if(i.gen.x.y==min_index) {
         #Do this first time through as well as possibly right after each
         #wraparound when numrows mod batch_size is zero:
         if (shuffle) {
            rows.gen.x.y <<- sample(c(min_index:max_index),
                                    size = max_index-min_index+1)
         } else {
            rows.gen.x.y <<- min_index:max_index
      #At this point, rows.gen.x.y is an index (in desired order, shuffled or
      #not) into the data we will step through over multiple calls to the
      #generator function.  Also, i.gen.x.y points to somewhere in the body of
      #rows.gen.x.y (possibly close to or at, but not past, its end).  So select
      #some data, hopefully a full batch, possibly a partial or singleton batch
      #if near end
      pick.from.rows.gen.x.y <- i.gen.x.y:min(i.gen.x.y+batch_size-1, max_index)
      this.batch <- rows.gen.x.y[pick.from.rows.gen.x.y-min_index+1]

      #now bump i.gen.x.y up for next time.
      i.gen.x.y <<- i.gen.x.y + length(this.batch)

      #wrap around when you run out of data.  By using > not >=, ensure that
      #final value of training data gets used (even if it happens to be the only
      #member in the next batch!)
      if (i.gen.x.y> max_index) {
         i.gen.x.y <<- min_index

      count.extra.rows.needed <- batch_size-length(this.batch)
      if(count.extra.rows.needed>0) { requires no action, but wrap.around
         #needs to do the work normally handled at the top of each iteration:
         if(uneven.mode=="wrap.around") {
            #copy/paste of earlier code setting up data from the top
            if (shuffle) {
               rows.gen.x.y <<- sample(c(min_index:max_index),
                                       size = max_index-min_index+1)
            } else {
               rows.gen.x.y <<- min_index:max_index
            #copy/paste of earlier code, but modified to extract
            #count.extra.rows.needed, not batch_size, and glom on to the current
            #undersized batch
            extra.pick.from.rows.gen.x.y <-
            extra.this.batch <-
            #now bump i.gen.x.y up for next time.
            i.gen.x.y <<- i.gen.x.y + length(extra.this.batch)
            this.batch <- c(this.batch, extra.this.batch)

      #X shape
      # = (samples, timesteps, features)
      samples <- X[this.batch, , , drop=F]

      #Y is a vector of length matching dim1() of X.
      targets <- Y[this.batch]
      # ##Keras also would allow Y to be a 1D array if you prefer:
      # ##targets <- array(Y[this.batch], dim = c(length(this.batch)))

      #Do NOT use names on this list() iof length two; their presence currently
      #breaks the python interface downstream unless you add a name stripping 
      #wrapper into keras:::fit_generator
      list(samples, targets)

fake <- function(dim, bias) array((seq(prod(dim))*(sqrt(5)-1)+bias)%%2-1, dim)

X <- fake(c(200, 21, 1), .1)
Y <- fake(200, .2)
validation.X <- fake(c(118, 21, 1), .3)
validation.Y <- fake(118, .5)

gen1 <- generator.demo.x.y(X, Y, shuffle=FALSE, batch_size=32,

early.stopping.callback.list <- 
   list(keras:::callback_early_stopping(monitor = "val_loss",
                                patience = 75,
                                verbose=1)) <- 
   list(keras:::callback_reduce_lr_on_plateau(monitor = "val_loss",
                                      factor = 0.5,
                                      patience = 5,

callbacks <- list() <- callbacks <- callbacks
need.prelim.phase <- FALSE
lr.sched.callback.list <- NULL <- c(, early.stopping.callback.list)
need.prelim.phase <- TRUE <- c(, <- TRUE

validation.shuffle <- FALSE <- 200

full.train.length <- dim(X)[1]
partial.train.length <- round(0.9 * full.train.length)
partial.train.indices <- 1:partial.train.length
partial_train_X <- X[partial.train.indices, , , drop=F]
partial_train_Y <- Y[partial.train.indices]
partial_val_X <- X[-partial.train.indices, , , drop=F]
partial_val_Y <- Y[-partial.train.indices] <- c(, lr.sched.callback.list)

data.shape <- dim(X)[-1]

batch_size <- 32
steps_per_epoch <- round(dim(X)[1]/batch_size)
train.gen.full <- generator.demo.x.y(X, Y, shuffle=FALSE, batch_size=32,

train_generator <- train.gen.full
input_shape <- data.shape
num_epochs <-
callbacks <-

validation.gen.full <- generator.demo.x.y(validation.X ,validation.Y,
                                          shuffle=FALSE,  batch_size=32,

#Build and compile a Keras model
n.units <- 2
dropout_frac <- 0.5
recurrent_dropout_frac <- 0.5
model0 <- keras_model_sequential() %>%
     layer_lstm(units=n.units, dropout = dropout_frac,
                recurrent_dropout = recurrent_dropout_frac,
                input_shape=input_shape) %>%
   layer_dense(units = 1)

my.opt <- optimizer_rmsprop()
model <- model0 %>% compile(optimizer = my.opt,
                            loss = "mse",
                            metrics = c("mae")

   #Save full history so I can pull out the computed metrics

history <- model %>% fit_generator(
     validation_data = validation.gen.full,
     validation_steps = 1,
     epochs = num_epochs, verbose = TRUE,
     callbacks = callbacks

fitted.model <- list(model=model, history=history)
fitted.model$model <- serialize_model(fitted.model$model,include_optimizer=TRUE)
names(fitted.model)[which(names(fitted.model)=="model")] <- "serialized.model"

#' Created on r Sys.Date() by the reprex package (vr utils::packageVersion("reprex"))

Hi, Christian,

thanks for providing the code, but:

  1. Although this is indeed a reprex (and thanks for providing it!), it's a very long code. This is often discouraging to people who want to help, because parsing through your code would require quite a lot of time. The title of your question is "Debugging keras", thus you might consider writing the smallest possible (minimal) reproducible example, which still shows the problem you're interested in, rather than directly copying your original 200+ lines code, which has a lot of parts non directly related to debugging keras models. Stack overflow has a great discussion with ideas on how to do this.

  2. am I wrong, or does this code come from a Markdown document? I don't understand why the YAML preamble and the

    knitr::opts_chunk$set(collapse = TRUE, comment = "#>", error = TRUE)
    knitr::opts_knit$set( = knitr::imgur_upload)

    parts, otherwise.

  3. why did you duplicate this code line?


Thank you for your patient help, Andrea! As you can tell, I'm still wrestling with the machinery. So I'll answer your minor questions first:

  1. It came from R source code. The line #+ reprex-body and everything above it were added by reprex::reprex(), including the YAML preamble and knitr::opt_* calls.

  2. I got that off people who know keras better than I do. They call tensorflow::use_session_with_seed(1) twice in a row to initialize the random number generator to a known state, in the hope of getting reproducible results.

This seems very weird to me. At this point, I would be really curious to hear what the RStudio Keras/Tensorflow experts think about this, but I always thought that calling it once or twice didn't make any difference. The only thing that is important is to call it immediately after loading keras or tensorflow:

anyway, getting 100% reproducible results with the modern Deep Learning libraries is very difficult. The only case I know of where this happens, is with JAX, which uses its own RNG.

I had my doubts about the double call to use_session_with_seed myself. Quite probably, they wrote it accidentally, and I copied it.

Regarding my complicated reprex, it doesn't really matter what you do with keras, my question was simply if there was a way to debug it. So put together a simpler case from the vignette. Unfortunately reprex fails due to dependency hell. I try this:

model <- keras_model_sequential()
model %>% layer_dense(units = 64, activation = 'relu') %>%
      layer_dense(units = 64, activation = 'relu') %>%
      layer_dense(units = 10, activation = 'softmax')
model %>% compile(optimizer = 'adam', loss = 'categorical_crossentropy',
                  metrics = list('accuracy') )
data <- matrix(rnorm(1000 * 32), nrow = 1000, ncol = 32)
labels <- matrix(rnorm(1000 * 10), nrow = 1000, ncol = 10)
model %>% fit(data, labels, epochs = 10, batch_size = 32)

but that stops because my pandoc is too old:

Rendering reprex...
Error: pandoc version 1.12.3 or higher is required and was not found (see the help page ?rmarkdown::pandoc_available).

So I downloaded and installed it:

 sudo dpkg -i /home/brech/Downloads/pandoc-2.6-1-amd64.deb

Now reprex::reprex fails differently:

Rendering reprex...
^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M|^M^M-^M^M ^M
Error: pandoc document conversion failed with error 2

So far I see

  • I call reprex::reprex,
  • which calls reprex_render,
  • which calls callr::r_safe passing a function that calls rmarkdown::render,
  • which calls run_r,
  • which calls processx::run,
  • which runs /usr/lib/R/bin/R --slave --no-save --no-restore -f /tmp/RtmpSlrrBC/file31d797686e4 in a new process,
  • which does readRDS("/tmp/RtmpSlrrBC/file31d7644a2ab9")
    and calls rmarkdown::render,
  • which calls its local function convert,
  • which calls pandoc_convert,
  • which runs the command /usr/bin/pandoc +RTS -K512m -RTS --to markdown_strict --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash --output --standalone --from=markdown-implicit_figures --to=commonmark --no-wrap in a subprocess,
  • which complains:
    --no-wrap has been removed. Use --wrap=none instead.
    Try pandoc --help for more information.
    and exits with status 2 (as the error above told me).

When I follow that advice and change args[[14]] <- "--wrap=none", pandoc creates, as follows:

# adapted from

model <- keras_model_sequential()
model %>% layer_dense(units = 64, activation = 'relu') %>%
      layer_dense(units = 64, activation = 'relu') %>%
      layer_dense(units = 10, activation = 'softmax')
model %>% compile(optimizer = 'adam', loss = 'categorical_crossentropy',
                  metrics = list('accuracy') )
data <- matrix(rnorm(1000 * 32), nrow = 1000, ncol = 32)
labels <- matrix(rnorm(1000 * 10), nrow = 1000, ncol = 10)
model %>% fit(data, labels, epochs = 10, batch_size = 32 )

Created on 2019-02-26 by the reprex package (v0.2.1.9000)

So it doesn't look like pandoc did all too much to my example (it ran my R code, creating and training the model, but that part got lost).

Anyway, this seems rather circuitous, making reprex tedious to debug. Which brings me back to my original question:

What techniques do you use to debug keras?
Is there a debugger that would let me set breakpoints in the Python part?
If yes, how do you arrange for it to be launched?

Reprex succeeds. Same output:

# adapted from

model <- keras_model_sequential()
model %>% layer_dense(units = 64, activation = 'relu') %>%
      layer_dense(units = 64, activation = 'relu') %>%
      layer_dense(units = 10, activation = 'softmax')
model %>% compile(optimizer = 'adam', loss = 'categorical_crossentropy',
                  metrics = list('accuracy') )
data <- matrix(rnorm(1000 * 32), nrow = 1000, ncol = 32)
labels <- matrix(rnorm(1000 * 10), nrow = 1000, ncol = 10)
model %>% fit(data, labels, epochs = 10, batch_size = 32 )

Created on 2019-02-27 by the reprex package (v0.2.1.9000)

1 Like


I really appreciate your efforts and persistence, but a minimal reproducible example should must be minimal :white_check_mark:, reproducible :white_check_mark:, and still an example of the issue you're having :sweat_smile: Your current code doesn't have any bug, or at least it runs perfectly on my instance, so it looks like that in the effort to minimize it, you also got rid of whatever bug you were having

Anyway, to avoid further back and forth, I'll give you a list of generic suggestions to "debug" a Deep Neural Network code in Keras. I hope this is what you were looking for. If not, sorry, I tried my best :slight_smile:

First of all, one should always fix the random seed to ensure reproducible results, when trying to fix a model which doesn't give satisfactory results, or which doesn't run. As per use_session_with_seed() documentation

ensuring really reproducible results implies that both GPU execution and CPU parallelism will be turned off. This implies that the model fitting will be really slow, so better to test on a small sample of the training set, rather than on the full training set.

Secondly, you would actually need need different strategies to fix a neural network which doesn't train (i.e., cannot reduce training error as much as desired), and one that doesn't generalize (i.e., cannot reduce test error as much as desired). However, for the sake of brevity, I'll only give generic suggestions which should help in both cases.

Unit tests

In the first case, one should start writing unit tests for each function used in the code, see e.g. here (Tensorflow code, but the principle applies to Keras too). Loading the dataset, initializing the weights, defining the architecture, fitting the model, etc.: each of these step should have its own function, and its own unit test(s).

Check the data set

data <- matrix(rnorm(1000 * 32), nrow = 1000, ncol = 32) 
labels <- matrix(rnorm(1000 * 10), nrow = 1000, ncol = 10)

in your example, X (the sample matrix) and y (the labels vector) are random, and you don't have a test set (only a training set) thus there's not much to check. In general, however, you may want to take a (small) random sample of the examples your NN classified correctly, a (small) random sample of the examples which the NN classified incorrectly, and verify that the labels are correct. Sometimes even the best datasets have label noise! Also, check that the normalization of your data set has been done correctly: again, in you example the training set is by definition normalized (or better standardized, in your case), but this is not always the case. Try reshuffling the order in which the training samples are shown to the NN, and see if that affects the training error.

Randomization tests

There are two tests which are very useful to check if there are bugs in your NN: first, train on a single minibatch. The training set error should go to 0 very quickly, and the validation error should quickly go to 100%. The other is to train on the whole dataset, but shuffle the labels. This time, the training set error should slowly reach 0 (if it doesn't, the NN is not able to overfit the training set: bad practice, you should use a bigger NN), and the test set error reach the random chance level, since there's no association anymore between inputs and labels.

Check the initialization

It has become increasingly clear that a large part of the success of neural networks is due to good initialization of the network weights (e.g., Thus, you must be sure that the initialization of the weights is correct. Here you can see how to check networks weights before training.

Check individual layers

Tensorflow allows you to visualize the activations of individual layers: this can be incredibly useful to catch buggy units, especially if you're using custom layers. Look here for tutorials on how to use Tensorboard in RStudio:

Check the effect of regularization

Sometimes regularization can prevent (or slow down) the training loss blowup, thus masking important issues with your code. Thus, it's always good practice to switch off regularization (i.e., comment out layer_batch_normalization, set all layer_dropout rates to 0, set L1/L2 regularization
factor to 0, etc.). and verify that your NN is able to overfit the training set.

Perform numerical experiments, and take note of them

If all else fails, then it's time for the most dreaded and most useful NN "debugging" technique: modify various hyperparameters (learning rate, number or layers, number of units per layer, activation function, etc.) and record the results of each experiment. The package tfruns is your friend here.

Further reading

Other resources which you may consult if you're stuck training a NN:

(from yours truly :grin:)

Hope this helped!


Hi Andrea,

I can't thank you enough for all the help you have given me. Yes, your "generic" suggestions are truly helpful. I was exactly looking for what techniques experts use to figure out problems with neural networks in Keras. And you carefully selected links to further resources, wich are hugely useful.

Thank you so much!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.