Keras not working with CloudML

Hi,
I'm trying to execute this keras example with CloudML: https://tensorflow.rstudio.com/blog/keras-fraud-autoencoder.html

And it doesn't work. These are the logs for this job from Google:

I  master-replica-0   Running setup.py bdist_wheel for cloudml: started master-replica-0 
I  master-replica-0 Building wheels for collected packages: cloudml master-replica-0 
I  master-replica-0 Processing ./cloudml-1.0.0.0.zip master-replica-0 
I  master-replica-0 Running command: pip install --user --upgrade --force-reinstall --no-deps cloudml-1.0.0.0.zip master-replica-0 
I  master-replica-0 Installing the package: gs://mapagamaduo/r-cloudml/staging/packages/a40bd4a4a5150d09ffbb922248a16150f4b16d6fb606d032338baae98c51d2c3/cloudml-1.0.0.0.zip master-replica-0 
I  master-replica-0 Running command: gsutil -q cp gs://mapagamaduo/r-cloudml/staging/packages/a40bd4a4a5150d09ffbb922248a16150f4b16d6fb606d032338baae98c51d2c3/cloudml-1.0.0.0.zip cloudml-1.0.0.0.zip master-replica-0 
I  master-replica-0 Downloading the package: gs://mapagamaduo/r-cloudml/staging/packages/a40bd4a4a5150d09ffbb922248a16150f4b16d6fb606d032338baae98c51d2c3/cloudml-1.0.0.0.zip master-replica-0 
I  master-replica-0 Running module cloudml-model.cloudml.deploy. master-replica-0 
I  Job failed. 
I  master-replica-0 Running task with arguments: --cluster={"master": ["cmle-training-master-2435f17993-0:2222"]} --task={"type": "master", "index": 0, "trial": "4"} --job={
  "scale_tier": "CUSTOM",
  "master_type": "standard_gpu",
  "package_uris": ["gs://mapagamaduo/r-cloudml/staging/packages/a40bd4a4a5150d09ffbb922248a16150f4b16d6fb606d032338baae98c51d2c3/cloudml-1.0.0.0.zip"],
  "python_module": "cloudml-model.cloudml.deploy",
  "args": ["Rscript"],
  "hyperparameters": {
    "goal": "MINIMIZE",
    "params": [{
      "parameter_name": "normalization",
      "type": "CATEGORICAL",
      "categorical_values": ["zscore", "minmax"]
    }, {
      "parameter_name": "activation",
      "type": "CATEGORICAL",
      "categorical_values": ["relu", "selu", "tanh", "sigmoid"]
    }, {
      "parameter_name": "learning_rate",
      "min_value": 1.0E-6,
      "max_value": 0.1,
      "type": "DOUBLE",
      "scale_type": "UNIT_LOG_SCALE"
    }, {
      "parameter_name": "hidden_size",
      "min_value": 5.0,
      "max_value": 50.0,
      "type": "INTEGER",
      "scale_type": "UNIT_LINEAR_SCALE"
    }],
    "max_trials": 10,
    "max_parallel_trials": 5,
    "hyperparameter_metric_tag": "val_loss"
  },
  "region": "us-central1",
  "runtime_version": "1.6",
  "job_dir": "gs://mapagamaduo/r-cloudml/staging"
} --hyperparams={"activation":"relu","hidden_size":"5","learning_rate":"2.1140109783759685e-06","normalization":"minmax"} master-replica-0 
I  Finished tearing down TensorFlow. 
I  master-replica-0   Running setup.py bdist_wheel for cloudml: started master-replica-0 
I  master-replica-0 Installing the package: gs://mapagamaduo/r-cloudml/staging/packages/a40bd4a4a5150d09ffbb922248a16150f4b16d6fb606d032338baae98c51d2c3/cloudml-1.0.0.0.zip master-replica-0 
I  master-replica-0 Running command: gsutil -q cp gs://mapagamaduo/r-cloudml/staging/packages/a40bd4a4a5150d09ffbb922248a16150f4b16d6fb606d032338baae98c51d2c3/cloudml-1.0.0.0.zip cloudml-1.0.0.0.zip master-replica-0 
I  master-replica-0 Downloading the package: gs://mapagamaduo/r-cloudml/staging/packages/a40bd4a4a5150d09ffbb922248a16150f4b16d6fb606d032338baae98c51d2c3/cloudml-1.0.0.0.zip master-replica-0 
I  master-replica-0 Running module cloudml-model.cloudml.deploy. master-replica-0 
I  Finished tearing down TensorFlow. 
E  The replica master 0 exited with a non-zero status of 1. Termination reason: Error. To find out more about why your job exited please check the logs: https://console.cloud.google.com/logs/viewer?project=616618339671&resource=ml_job%2Fjob_id%2Fcloudml_2018_06_04_153534814&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%22cloudml_2018_06_04_153534814%22 
I  master-replica-0 Command ['Rscript', '/root/.local/lib/python2.7/site-packages/cloudml-model/cloudml/deploy.R', '--activation', 'selu', '--hidden_size', '50', '--learning_rate', '0.095436491804553159', '--normalization', 'minmax'] failed: exit code 1 master-replica-0 
I  master-replica-0 Execution halted master-replica-0 
I  master-replica-0 Error: object 'job_config' not found master-replica-0 
I  master-replica-0 Using TensorFlow backend. master-replica-0 
I  master-replica-0 > setwd("D:/Proyectos/CloudML/R_fraud") master-replica-0 
I  master-replica-0 > rm(list = ls()) master-replica-0 
I  master-replica-0 > library(purrr) master-replica-0 
I  master-replica-0     intersect, setdiff, setequal, union master-replica-0 
I  master-replica-0 The following objects are masked from 'package:base': master-replica-0 
I  master-replica-0     filter, lag master-replica-0 
I  master-replica-0 The following objects are masked from 'package:stats': master-replica-0 
I  master-replica-0 Attaching package: 'dplyr' master-replica-0 
I  master-replica-0 > library(dplyr) master-replica-0 
I  master-replica-0 > library(keras) master-replica-0 
I  master-replica-0 > library(readr) master-replica-0 
I  master-replica-0 Using run directory runs/cloudml_2018_06_04_153534814 master-replica-0 
I  master-replica-0 Clean up finished. master-replica-0 
I  master-replica-0 / [0/1 files][    0.0 B/408.0 MiB]   0% Done                                    
-
- [0/1 files][ 82.8 MiB/408.0 MiB]  20% Done                                    
\
|
| [0/1 files][173.5 MiB/408.0 MiB]  42% Done                                    
/
/ [0/1 files][256.3 MiB/408.0 MiB]  62% Done                                    
-
\
\ [0/1 files][350.1 MiB/408.0 MiB]  85% Done                                    
|
| [1/1 files][408.0 MiB/408.0 MiB] 100% Done                                    
/
 master-replica-0 
I  master-replica-0 so slow that gsutil disables downloads of composite objects. master-replica-0 
I  master-replica-0 without a compiled crcmod, computing checksums on composite objects is master-replica-0 
I  master-replica-0 compiled crcmod installed (see "gsutil help crcmod"). This is because master-replica-0 
I  master-replica-0 means that any user who downloads such objects will need to have a master-replica-0 
I  master-replica-0 <https://cloud.google.com/storage/docs/composite-objects>`_,which master-replica-0 
I  master-replica-0 be uploaded as `composite objects master-replica-0 
E  master-replica-0 Command '['python', '-m', u'cloudml-model.cloudml.deploy', u'Rscript', u'--activation', u'selu', u'--hidden_size', u'50', u'--learning_rate', u'0.095436491804553159', u'--normalization', u'minmax', '--job-dir', 'gs://mapagamaduo/r-cloudml/staging/2']' returned non-zero exit status 1 master-replica-0 
I  master-replica-0 configuration file. However, note that if you do this large files will master-replica-0 

I think that the problem is related with this python command that gives error in all the scripts. More examples:
master-replica-0 Command '['python', '-m', u'cloudml-model.cloudml.deploy', u'Rscript', '--job-dir', u'gs://mapagamaduo/r-cloudml/staging']' returned non-zero exit status 1
master-replica-0 Command '['python', '-m', u'cloudml-model.cloudml.deploy', u'Rscript', '--job-dir', u'gs://mapagamaduo/r-cloudml/staging']' returned non-zero exit status 1

My session was:

Session info --------------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.4 (2018-03-15)
 system   x86_64, mingw32             
 ui       RStudio (1.1.442)           
 language (EN)                        
 collate  Spanish_Spain.1252          
 tz       Europe/Berlin               
 date     2018-06-04                  

Packages ------------------------------------------------------------------------------------------------------------------------------------------------
 package    * version date       source                          
 assertthat   0.2.0   2017-04-11 CRAN (R 3.4.4)                  
 backports    1.1.2   2017-12-13 CRAN (R 3.4.3)                  
 base       * 3.4.4   2018-03-15 local                           
 base64enc    0.1-3   2015-07-28 CRAN (R 3.4.1)                  
 cloudml    * 0.5     2018-05-24 Github (rstudio/cloudml@4ce808c)
 compiler     3.4.4   2018-03-15 local                           
 crayon       1.3.4   2017-09-16 CRAN (R 3.4.4)                  
 datasets   * 3.4.4   2018-03-15 local                           
 debugme      1.1.0   2017-10-22 CRAN (R 3.4.4)                  
 devtools     1.13.5  2018-02-18 CRAN (R 3.4.3)                  
 digest       0.6.15  2018-01-28 CRAN (R 3.4.3)                  
 graphics   * 3.4.4   2018-03-15 local                           
 grDevices  * 3.4.4   2018-03-15 local                           
 here         0.1     2017-05-28 CRAN (R 3.4.4)                  
 jsonlite     1.5     2017-06-01 CRAN (R 3.4.4)                  
 magrittr     1.5     2014-11-22 CRAN (R 3.4.4)                  
 memoise      1.1.0   2017-04-21 CRAN (R 3.4.4)                  
 methods    * 3.4.4   2018-03-15 local                           
 packrat      0.4.9-2 2018-04-20 CRAN (R 3.4.4)                  
 processx     3.1.0   2018-05-15 CRAN (R 3.4.4)                  
 R6           2.2.2   2017-06-17 CRAN (R 3.4.4)                  
 rprojroot    1.3-2   2018-01-03 CRAN (R 3.4.4)                  
 rstudioapi   0.7     2017-09-07 CRAN (R 3.4.4)                  
 stats      * 3.4.4   2018-03-15 local                           
 tfruns     * 1.3     2018-05-24 Github (rstudio/tfruns@03fb652) 
 tools        3.4.4   2018-03-15 local                           
 utils      * 3.4.4   2018-03-15 local                           
 whisker      0.3-2   2013-04-28 CRAN (R 3.4.4)                  
 withr        2.1.2   2018-03-15 CRAN (R 3.4.4)                  
 yaml         2.1.19  2018-05-01 CRAN (R 3.4.4)                  

I was trying to use CloudML with other scripts and they allways fail, could you please give some advice of what to do?

Thanks.

1 Like

Hi,

from my (basic) experience, running Keras in RStudio through the CloudML gives a lot of problems if your code has input/output to the console or the files used are in a different directory than the working directory. For example, I had a code that gave the same "Error: object 'job_config' not found" because I was using:

evaluate(m, test.x, test.y)

This was fixed by changing it to:

eva <- m %>% evaluate(test.x, test.y, verbose = 0)

cat('Test loss:', eva[[1]], '\n')
cat('Test accuracy:', eva[[2]], '\n')

Try for instance the mnist_mlp.R example code; this worked for me.

Hope it helps