You can supply training and validation data by passing either an array or a generator function. I would have expected identical results if I supply the same data. For training that's true, and the weights are identical. But for validation, I get different results. Andrea Panizza helped me immensely (debugging keras) on the way to formulating a reprex. So here goes:
generator.x.y <- function(X, Y, batch_size)
{
i_ <- 0
function() {
this.batch <- seq(i_, len=batch_size) %% nrow(X) + 1 # wrap around
i_ <<- i_ + batch_size
list(X[this.batch,,, drop=F], Y[this.batch])
}
}
fake <- function(dim, bias) array((seq(prod(dim))*(sqrt(5)-1)+bias)%%2-1, dim)
batch_size <- 32
steps_per_epoch <- 3
n_samp <- batch_size * steps_per_epoch
X <- fake(c(n_samp, 21, 1), .1)
Y <- seq(-1, 1, len=n_samp)
valid_len <- 1024
validation.X <- fake(c(valid_len, 21, 1), .3)
validation.Y <- seq(-1, 1, len=valid_len)
train_generator <- generator.x.y(X, Y, batch_size)
valid_generator <- generator.x.y(validation.X ,validation.Y, valid_len)
library(keras)
use_virtualenv("~/.virtualenvs/r-tensorflow")
build <- function() {
tensorflow:::use_session_with_seed(1)
#Build and compile a Keras model
model <- keras_model_sequential() %>%
layer_lstm(units=2, dropout = 0.5,
recurrent_dropout = 0.5,
input_shape=dim(X)[-1], name="my_lstm") %>%
layer_dense(units = 1, name="my_dense")
model %>% compile(optimizer = optimizer_rmsprop(),
loss = "mse",
metrics = c("mae")
)
}
callbacks <-
list(keras:::callback_early_stopping (monitor = "val_loss",
patience = 75),
keras:::callback_reduce_lr_on_plateau(monitor = "val_loss",
factor = 0.5,
patience = 5))
epochs <- 20
# train and validate with generator
model <- build()
#> Set session seed to 1 (disabled GPU, CPU parallelism)
gg <- model %>% fit_generator(
train_generator,
steps_per_epoch=steps_per_epoch, callbacks=callbacks,
validation_data = valid_generator,
validation_steps = 1, epochs = epochs, verbose = 2)
# train with generator, validate with data
model <- build()
#> Set session seed to 1 (disabled GPU, CPU parallelism)
gd <- model %>% fit_generator(
train_generator,
steps_per_epoch=steps_per_epoch, callbacks=callbacks,
validation_data = list(validation.X, validation.Y),
validation_steps = 1, epochs = epochs, verbose = 2)
stopifnot(all.equal(gg$metrics, gd$metrics))
#> Error in eval(expr, envir, enclos): gg$metrics and gd$metrics are not equal:
#> Component "val_mean_absolute_error": Mean relative difference: 5.893696e-08
#> Component "val_loss": Mean relative difference: 5.589292e-08
# P.S.: training and validating with data gives result identical to gd:
model <- build()
#> Set session seed to 1 (disabled GPU, CPU parallelism)
dd <- model %>% fit(
X, Y, shuffle=FALSE, callbacks=callbacks,
validation_data = list(validation.X, validation.Y),
epochs = epochs, verbose = 2)
Created on 2019-04-05 by the reprex package (v0.2.1.9000)
While R only supports double (float 64), the python part of keras uses single (float 32). So the discrepancy is around the machine epsilon, and might be explained by floating point noise. But I'd like to see exactly where it happens.
The function mtrace
from the MVB debugger lets me set breakpoints in R and inspect variables. That lets me go about till reticulate:::py_call
, and doesn't give much insight.
It helps to stick some code at the beginning of the function fit
or fit_generator
in r-tensorflow/lib/python2.7/site-packages/keras/engine/training.py
to invoke the Python debugger, like this:
import pdb; pdb.set_trace()
The call chain seems to be
-
keras/engine/training.py(1041)fit()
-
keras/engine/training_arrays.py(154)fit_loop()
-
keras/backend/tensorflow_backend.py(2715)__call__()
-
keras/backend/tensorflow_backend.py(2675)_call()
-
tensorflow/python/client/session.py(1439)__call__()
-
_pywrap_tensorflow_internal.TF_SessionRunCallable(session, handle, feed_values, out_status, run_metadata)
PDB won't step into the function TF_SessionRunCallable
; it acts atomic. Guessing from the name, it might ultimately call the function _ZN10tensorflow7Session11RunCallableExRKSt6vectorINS_6TensorESaIS2_EEPS4_PNS_11RunMetadataE
in the shared object libtensorflow_framework.so
, which is c++-mangled for
tensorflow::Session::RunCallable(
long long,
std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> > const&,
std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*,
tensorflow::RunMetadata*)
Tensorboard shows the graph for my model. The subgraph metrics
has a subgraph mean_absolute_error
, with this chain of nodes: sub
, Abs
, Mean
, Mean1
. It's just a hunch, but if the arguments to Mean arrive in a different order, that would be enough to give a different outcome, thanks to the peculiarities of floating point arithmetics. Is there a way to see the data? I know in tensorflow you would add summary nodes to the graph, but how can you even access the graph from keras in R?