Hi,
I'm trying to execute this keras example with CloudML: https://tensorflow.rstudio.com/blog/keras-fraud-autoencoder.html
And it doesn't work. These are the logs for this job from Google:
I master-replica-0 Running setup.py bdist_wheel for cloudml: started master-replica-0
I master-replica-0 Building wheels for collected packages: cloudml master-replica-0
I master-replica-0 Processing ./cloudml-1.0.0.0.zip master-replica-0
I master-replica-0 Running command: pip install --user --upgrade --force-reinstall --no-deps cloudml-1.0.0.0.zip master-replica-0
I master-replica-0 Installing the package: gs://mapagamaduo/r-cloudml/staging/packages/a40bd4a4a5150d09ffbb922248a16150f4b16d6fb606d032338baae98c51d2c3/cloudml-1.0.0.0.zip master-replica-0
I master-replica-0 Running command: gsutil -q cp gs://mapagamaduo/r-cloudml/staging/packages/a40bd4a4a5150d09ffbb922248a16150f4b16d6fb606d032338baae98c51d2c3/cloudml-1.0.0.0.zip cloudml-1.0.0.0.zip master-replica-0
I master-replica-0 Downloading the package: gs://mapagamaduo/r-cloudml/staging/packages/a40bd4a4a5150d09ffbb922248a16150f4b16d6fb606d032338baae98c51d2c3/cloudml-1.0.0.0.zip master-replica-0
I master-replica-0 Running module cloudml-model.cloudml.deploy. master-replica-0
I Job failed.
I master-replica-0 Running task with arguments: --cluster={"master": ["cmle-training-master-2435f17993-0:2222"]} --task={"type": "master", "index": 0, "trial": "4"} --job={
"scale_tier": "CUSTOM",
"master_type": "standard_gpu",
"package_uris": ["gs://mapagamaduo/r-cloudml/staging/packages/a40bd4a4a5150d09ffbb922248a16150f4b16d6fb606d032338baae98c51d2c3/cloudml-1.0.0.0.zip"],
"python_module": "cloudml-model.cloudml.deploy",
"args": ["Rscript"],
"hyperparameters": {
"goal": "MINIMIZE",
"params": [{
"parameter_name": "normalization",
"type": "CATEGORICAL",
"categorical_values": ["zscore", "minmax"]
}, {
"parameter_name": "activation",
"type": "CATEGORICAL",
"categorical_values": ["relu", "selu", "tanh", "sigmoid"]
}, {
"parameter_name": "learning_rate",
"min_value": 1.0E-6,
"max_value": 0.1,
"type": "DOUBLE",
"scale_type": "UNIT_LOG_SCALE"
}, {
"parameter_name": "hidden_size",
"min_value": 5.0,
"max_value": 50.0,
"type": "INTEGER",
"scale_type": "UNIT_LINEAR_SCALE"
}],
"max_trials": 10,
"max_parallel_trials": 5,
"hyperparameter_metric_tag": "val_loss"
},
"region": "us-central1",
"runtime_version": "1.6",
"job_dir": "gs://mapagamaduo/r-cloudml/staging"
} --hyperparams={"activation":"relu","hidden_size":"5","learning_rate":"2.1140109783759685e-06","normalization":"minmax"} master-replica-0
I Finished tearing down TensorFlow.
I master-replica-0 Running setup.py bdist_wheel for cloudml: started master-replica-0
I master-replica-0 Installing the package: gs://mapagamaduo/r-cloudml/staging/packages/a40bd4a4a5150d09ffbb922248a16150f4b16d6fb606d032338baae98c51d2c3/cloudml-1.0.0.0.zip master-replica-0
I master-replica-0 Running command: gsutil -q cp gs://mapagamaduo/r-cloudml/staging/packages/a40bd4a4a5150d09ffbb922248a16150f4b16d6fb606d032338baae98c51d2c3/cloudml-1.0.0.0.zip cloudml-1.0.0.0.zip master-replica-0
I master-replica-0 Downloading the package: gs://mapagamaduo/r-cloudml/staging/packages/a40bd4a4a5150d09ffbb922248a16150f4b16d6fb606d032338baae98c51d2c3/cloudml-1.0.0.0.zip master-replica-0
I master-replica-0 Running module cloudml-model.cloudml.deploy. master-replica-0
I Finished tearing down TensorFlow.
E The replica master 0 exited with a non-zero status of 1. Termination reason: Error. To find out more about why your job exited please check the logs: https://console.cloud.google.com/logs/viewer?project=616618339671&resource=ml_job%2Fjob_id%2Fcloudml_2018_06_04_153534814&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%22cloudml_2018_06_04_153534814%22
I master-replica-0 Command ['Rscript', '/root/.local/lib/python2.7/site-packages/cloudml-model/cloudml/deploy.R', '--activation', 'selu', '--hidden_size', '50', '--learning_rate', '0.095436491804553159', '--normalization', 'minmax'] failed: exit code 1 master-replica-0
I master-replica-0 Execution halted master-replica-0
I master-replica-0 Error: object 'job_config' not found master-replica-0
I master-replica-0 Using TensorFlow backend. master-replica-0
I master-replica-0 > setwd("D:/Proyectos/CloudML/R_fraud") master-replica-0
I master-replica-0 > rm(list = ls()) master-replica-0
I master-replica-0 > library(purrr) master-replica-0
I master-replica-0 intersect, setdiff, setequal, union master-replica-0
I master-replica-0 The following objects are masked from 'package:base': master-replica-0
I master-replica-0 filter, lag master-replica-0
I master-replica-0 The following objects are masked from 'package:stats': master-replica-0
I master-replica-0 Attaching package: 'dplyr' master-replica-0
I master-replica-0 > library(dplyr) master-replica-0
I master-replica-0 > library(keras) master-replica-0
I master-replica-0 > library(readr) master-replica-0
I master-replica-0 Using run directory runs/cloudml_2018_06_04_153534814 master-replica-0
I master-replica-0 Clean up finished. master-replica-0
I master-replica-0 / [0/1 files][ 0.0 B/408.0 MiB] 0% Done
-
- [0/1 files][ 82.8 MiB/408.0 MiB] 20% Done
\
|
| [0/1 files][173.5 MiB/408.0 MiB] 42% Done
/
/ [0/1 files][256.3 MiB/408.0 MiB] 62% Done
-
\
\ [0/1 files][350.1 MiB/408.0 MiB] 85% Done
|
| [1/1 files][408.0 MiB/408.0 MiB] 100% Done
/
master-replica-0
I master-replica-0 so slow that gsutil disables downloads of composite objects. master-replica-0
I master-replica-0 without a compiled crcmod, computing checksums on composite objects is master-replica-0
I master-replica-0 compiled crcmod installed (see "gsutil help crcmod"). This is because master-replica-0
I master-replica-0 means that any user who downloads such objects will need to have a master-replica-0
I master-replica-0 <https://cloud.google.com/storage/docs/composite-objects>`_,which master-replica-0
I master-replica-0 be uploaded as `composite objects master-replica-0
E master-replica-0 Command '['python', '-m', u'cloudml-model.cloudml.deploy', u'Rscript', u'--activation', u'selu', u'--hidden_size', u'50', u'--learning_rate', u'0.095436491804553159', u'--normalization', u'minmax', '--job-dir', 'gs://mapagamaduo/r-cloudml/staging/2']' returned non-zero exit status 1 master-replica-0
I master-replica-0 configuration file. However, note that if you do this large files will master-replica-0
I think that the problem is related with this python command that gives error in all the scripts. More examples:
master-replica-0 Command '['python', '-m', u'cloudml-model.cloudml.deploy', u'Rscript', '--job-dir', u'gs://mapagamaduo/r-cloudml/staging']' returned non-zero exit status 1
master-replica-0 Command '['python', '-m', u'cloudml-model.cloudml.deploy', u'Rscript', '--job-dir', u'gs://mapagamaduo/r-cloudml/staging']' returned non-zero exit status 1
My session was:
Session info --------------------------------------------------------------------------------------------------------------------------------------------
setting value
version R version 3.4.4 (2018-03-15)
system x86_64, mingw32
ui RStudio (1.1.442)
language (EN)
collate Spanish_Spain.1252
tz Europe/Berlin
date 2018-06-04
Packages ------------------------------------------------------------------------------------------------------------------------------------------------
package * version date source
assertthat 0.2.0 2017-04-11 CRAN (R 3.4.4)
backports 1.1.2 2017-12-13 CRAN (R 3.4.3)
base * 3.4.4 2018-03-15 local
base64enc 0.1-3 2015-07-28 CRAN (R 3.4.1)
cloudml * 0.5 2018-05-24 Github (rstudio/cloudml@4ce808c)
compiler 3.4.4 2018-03-15 local
crayon 1.3.4 2017-09-16 CRAN (R 3.4.4)
datasets * 3.4.4 2018-03-15 local
debugme 1.1.0 2017-10-22 CRAN (R 3.4.4)
devtools 1.13.5 2018-02-18 CRAN (R 3.4.3)
digest 0.6.15 2018-01-28 CRAN (R 3.4.3)
graphics * 3.4.4 2018-03-15 local
grDevices * 3.4.4 2018-03-15 local
here 0.1 2017-05-28 CRAN (R 3.4.4)
jsonlite 1.5 2017-06-01 CRAN (R 3.4.4)
magrittr 1.5 2014-11-22 CRAN (R 3.4.4)
memoise 1.1.0 2017-04-21 CRAN (R 3.4.4)
methods * 3.4.4 2018-03-15 local
packrat 0.4.9-2 2018-04-20 CRAN (R 3.4.4)
processx 3.1.0 2018-05-15 CRAN (R 3.4.4)
R6 2.2.2 2017-06-17 CRAN (R 3.4.4)
rprojroot 1.3-2 2018-01-03 CRAN (R 3.4.4)
rstudioapi 0.7 2017-09-07 CRAN (R 3.4.4)
stats * 3.4.4 2018-03-15 local
tfruns * 1.3 2018-05-24 Github (rstudio/tfruns@03fb652)
tools 3.4.4 2018-03-15 local
utils * 3.4.4 2018-03-15 local
whisker 0.3-2 2013-04-28 CRAN (R 3.4.4)
withr 2.1.2 2018-03-15 CRAN (R 3.4.4)
yaml 2.1.19 2018-05-01 CRAN (R 3.4.4)
I was trying to use CloudML with other scripts and they allways fail, could you please give some advice of what to do?
Thanks.