Fatal Error when running Keras/Tensorflow with GPU

vvlajnic · June 8, 2018, 5:13pm

I get a fatal error when trying to run keras/tensorflow with a GPU (gtx 1080) in R. Here is my session info:

R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] keras_2.1.6

loaded via a namespace (and not attached):
 [1] compiler_3.5.0        magrittr_1.5          R6_2.2.2             
 [4] Matrix_1.2-14         tools_3.5.0           whisker_0.3-2        
 [7] base64enc_0.1-3       Rcpp_0.12.17          reticulate_1.7       
[10] tensorflow_1.5.0.9001 grid_3.5.0            zeallot_0.1.0        
[13] jsonlite_1.5          tfruns_1.3            lattice_0.20-35

Here is the code I am using:

detach("package:keras", unload=TRUE)
install.packages('keras')
library(keras)
install_keras(tensorflow = "gpu")

batch_size <- 128
num_classes <- 10
epochs <- 12

img_rows <- 28
img_cols <- 28

mnist <- dataset_mnist()
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y

input_shape <- c(img_rows, img_cols, 1)
x_train <- array(as.numeric(x_train), dim = c(dim(x_train)[[1]], input_shape))
x_test <- array(as.numeric(x_test), dim = c(dim(x_test)[[1]], input_shape))

x_train <- x_train / 255
x_test <- x_test / 255

cat('x_train_shape:', dim(x_train), '\n')
cat(dim(x_train)[[1]], 'train samples\n')
cat(dim(x_test)[[1]], 'test samples\n')

y_train <- to_categorical(y_train, num_classes)
y_test <- to_categorical(y_test, num_classes)

model <- keras_model_sequential()
model %>%
  layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = 'relu',
                input_shape = input_shape) %>%
  layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = 'relu') %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  layer_dropout(rate = 0.25) %>%
  layer_flatten() %>%
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dropout(rate = 0.5) %>%
  layer_dense(units = num_classes, activation = 'softmax')

model %>%
  compile(
    loss = loss_categorical_crossentropy,
    optimizer = optimizer_adadelta(),
    metrics = c('accuracy')
  )

model %>% fit(x_train, y_train,
              batch_size = batch_size,
              epochs = epochs,
              verbose = 1,
              validation_data = list(x_test, y_test)
)

And here is the last bit of output I get right at the end before R terminates the session due to error:

2018-06-08 12:46:12.265333: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-06-08 12:46:12.537038: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1344] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.59GiB
2018-06-08 12:46:12.537378: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1423] Adding visible gpu devices: 0
2018-06-08 12:46:13.017506: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-06-08 12:46:13.017637: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:917]      0 
2018-06-08 12:46:13.017729: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:930] 0:   N 
2018-06-08 12:46:13.017929: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6372 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)

Duconol · June 9, 2018, 5:12pm

I had the exact same problem, check that Cudnn is compatible with Tensorflow. I downgraded to 7 instead of 7.1 and it worked

vvlajnic · June 9, 2018, 5:57pm

Hey thank you for your response! How would I go about checking? And yes I have cudnn 9.0 v 7.1 downloaded. So just take out the files I put in that were 7.1 and switch them to cudnn 7.0?

Duconol · June 9, 2018, 6:19pm

Yes, just dl the 7.0 in the nvidia archives and change the three files.

tbradley · June 9, 2018, 7:55pm

Hi, it looks like your code was not formatted correctly to make it easy to read for people trying to help you. Formatting code allows for people to more easily identify where issues may be occuring, and makes it easier to read, in general. I have edited you post to format the code properly.

In the future please put code that is inline (such as a function name, like mutate or filter) inside of backticks (`mutate`) and chunks of code can be put between sets of three backticks:

```
example <- foo %>%
  filter(a == 1)
```

This will help keep our community tidy and help you get the help you are looking for!

For more information, please take a look at the community's FAQ on formating code

vvlajnic · June 10, 2018, 12:55am

@tbradley thank you! I wasn't sure how to do that, appreciate it!

vvlajnic · June 10, 2018, 1:07am

Thank you so much! that was it, it fixed it!

Duconol · June 10, 2018, 10:20am

Glad it worked.
For a more complete answer, the Python error that R doesn't show is :
"Loaded runtime CuDNN library: 7104 (compatibility version 7100) but source was compiled with 7003 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration. "