Validation differs when using generator

You can supply training and validation data by passing either an array or a generator function. I would have expected identical results if I supply the same data. For training that's true, and the weights are identical. But for validation, I get different results. Andrea Panizza helped me immensely (debugging keras) on the way to formulating a reprex. So here goes:

generator.x.y <- function(X, Y, batch_size)
   i_ <- 0
   function() {
      this.batch <- seq(i_, len=batch_size) %% nrow(X) + 1 # wrap around
      i_ <<- i_ + batch_size
      list(X[this.batch,,, drop=F], Y[this.batch])

fake <- function(dim, bias) array((seq(prod(dim))*(sqrt(5)-1)+bias)%%2-1, dim)

batch_size <- 32
steps_per_epoch <- 3
n_samp <- batch_size * steps_per_epoch

X <- fake(c(n_samp, 21, 1), .1)
Y <- seq(-1, 1, len=n_samp)
valid_len <- 1024
validation.X <- fake(c(valid_len, 21, 1), .3)
validation.Y <- seq(-1, 1, len=valid_len)
train_generator <- generator.x.y(X, Y, batch_size)
valid_generator <- generator.x.y(validation.X ,validation.Y, valid_len)


build <- function() {

   #Build and compile a Keras model
   model <- keras_model_sequential() %>%
      layer_lstm(units=2, dropout = 0.5,
                 recurrent_dropout = 0.5,
                 input_shape=dim(X)[-1], name="my_lstm") %>%
      layer_dense(units = 1, name="my_dense")
   model %>% compile(optimizer = optimizer_rmsprop(),
                     loss = "mse",
                     metrics = c("mae")

callbacks <-
   list(keras:::callback_early_stopping      (monitor  = "val_loss",
                                              patience = 75),
        keras:::callback_reduce_lr_on_plateau(monitor  = "val_loss",
                                              factor   = 0.5,
                                              patience = 5))

epochs <- 20

# train and validate with generator
model <- build()
#> Set session seed to 1 (disabled GPU, CPU parallelism)
gg <- model %>% fit_generator(
     steps_per_epoch=steps_per_epoch, callbacks=callbacks,
     validation_data = valid_generator,
     validation_steps = 1, epochs = epochs, verbose = 2)

# train with generator, validate with data
model <- build()
#> Set session seed to 1 (disabled GPU, CPU parallelism)
gd <- model %>% fit_generator(
     steps_per_epoch=steps_per_epoch, callbacks=callbacks,
     validation_data = list(validation.X, validation.Y),
     validation_steps = 1, epochs = epochs, verbose = 2)

stopifnot(all.equal(gg$metrics, gd$metrics))
#> Error in eval(expr, envir, enclos): gg$metrics and gd$metrics are not equal:
#>   Component "val_mean_absolute_error": Mean relative difference: 5.893696e-08
#>   Component "val_loss": Mean relative difference: 5.589292e-08

# P.S.: training and validating with data gives result identical to gd:
model <- build()
#> Set session seed to 1 (disabled GPU, CPU parallelism)
dd <- model %>% fit(
     X, Y, shuffle=FALSE,    callbacks=callbacks,
     validation_data = list(validation.X, validation.Y),
     epochs = epochs, verbose = 2)

Created on 2019-04-05 by the reprex package (v0.2.1.9000)

While R only supports double (float 64), the python part of keras uses single (float 32). So the discrepancy is around the machine epsilon, and might be explained by floating point noise. But I'd like to see exactly where it happens.

The function mtrace from the MVB debugger lets me set breakpoints in R and inspect variables. That lets me go about till reticulate:::py_call, and doesn't give much insight.

It helps to stick some code at the beginning of the function fit or fit_generator in r-tensorflow/lib/python2.7/site-packages/keras/engine/ to invoke the Python debugger, like this:

import pdb; pdb.set_trace()

The call chain seems to be

  • keras/engine/
  • keras/engine/
  • keras/backend/
  • keras/backend/
  • tensorflow/python/client/
  • _pywrap_tensorflow_internal.TF_SessionRunCallable(session, handle, feed_values, out_status, run_metadata)

PDB won't step into the function TF_SessionRunCallable; it acts atomic. Guessing from the name, it might ultimately call the function _ZN10tensorflow7Session11RunCallableExRKSt6vectorINS_6TensorESaIS2_EEPS4_PNS_11RunMetadataE in the shared object, which is c++-mangled for

      long long,
      std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> > const&,
      std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*,

Tensorboard shows the graph for my model. The subgraph metrics has a subgraph mean_absolute_error, with this chain of nodes: sub, Abs, Mean, Mean1. It's just a hunch, but if the arguments to Mean arrive in a different order, that would be enough to give a different outcome, thanks to the peculiarities of floating point arithmetics. Is there a way to see the data? I know in tensorflow you would add summary nodes to the graph, but how can you even access the graph from keras in R?

It's probably related to floating point arithmetics. If you switch to float64 by running:


You will get identical results.

My guess is that when using generators, data gets converted to float32 by Keras itself while when passing the data directly it is converted by numpy.

In fit_generator there's this line:

if (is.list(validation_data))
    validation_data <- keras_array(validation_data)

It will in turn use keras_array that will do something like this:

if (is.null(dtype) && is.double(x))
      dtype <- backend()$floatx()

I have to take a greater look for generators, they will use the as_generator.function method. But it seems that we are not changing the dtype.


Thank you! After invoking backend()$set_floatx("float64") indeed I get identical results from fitting with generator or data.


This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.