Trouble getting H2O to work with Sparklyr

I am trying to get H2O working with Sparklyr on my spark cluster (yarn)

spark_version(sc) = 2.4.4 My spark cluster is running V2.4.4

According to this page the compatible version with my spark is 2.4.5 for Sparkling Water and the H2O release is rel-xu patch version 3. However when I install this version I am prompted to update my H2O install to the next release (REL-ZORN). Between the H2O guides and the sparklyr guides it's very confusing and contradictory at times.

enter image description here

Since this is a yarn deployment and not local, unfortunately I can't provide a repex to help with trobleshooting.

url <- "http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.4/5/sparkling-water-2.4.5.zip"

download.file(url = url,"sparkling-water-2.4.5.zip")

unzip("sparkling-water-2.4.5.zip")

# RUN THESE CMDs FROM THE TERMINAL
cd sparkling-water-2.4.5
bin/sparkling-shell --conf "spark.executor.memory=1g"

# RUN THESE FROM WITHIN RSTUDIO
install.packages("sparklyr")
library(sparklyr)

# REMOVE PRIOR INSTALLS OF H2O
detach("package:rsparkling", unload = TRUE)
if ("package:h2o" %in% search()) { detach("package:h2o", unload = TRUE) }
if (isNamespaceLoaded("h2o")){ unloadNamespace("h2o") }
remove.packages("h2o")

# INSTALLING REL-ZORN (3.36.0.3) WHICH IS REQUIRED FOR SPARKLING WATER 3.36.0.3
install.packages("h2o", type = "source", repos = "https://h2o-release.s3.amazonaws.com/h2o/rel-zorn/3/R")

# INSTALLING FROM S3 SINCE CRAN NO LONGER SUPPORTED
install.packages("rsparkling", type = "source", repos = "http://h2o-release.s3.amazonaws.com/sparkling-water/spark-2.4/3.36.0.3-1-2.4/R")

# AS PER THE GUIDE
options(rsparkling.sparklingwater.version = "2.4.5")
library(rsparkling)

# SPECIFY THE CONFIGURATION
config <- sparklyr::spark_config()
config[["spark.yarn.queue"]] <- "my_data_science_queue"
config[["sparklyr.backend.timeout"]] <- 36000
config[["spark.executor.cores"]] <- 32
config[["spark.driver.cores"]] <- 32
config[["spark.executor.memory"]] <- "40g"
config[["spark.executor.instances"]] <- 8
config[["sparklyr.shell.driver-memory"]] <- "16g"
config[["spark.default.parallelism"]] <- "8"
config[["spark.rpc.message.maxSize"]] <- "256"

# MAKE A SPARK CONNECTION
sc <- sparklyr::spark_connect(
  master = "yarn",
  spark_home = "/opt/mapr/spark/spark",
  config = config,
  log = "console",
  version = "2.4.4"
)

When I try to establish a H2O context using the next chunk I get the following error

h2o_context(sc)

Error in h2o_context(sc) : could not find function "h2o_context"

Any pointers as to where I'm going wrong would be greatly appreciated.

Hi, I think posting an Issue in the GitHub repository for the package may be the best way to get some help on this specific issue: Issues · h2oai/sparkling-water · GitHub

Hi, as per the rsparkling package, it looks like that function it's spelled H2OContext (see: sparkling-water/NAMESPACE at c5ae8dc2d3932e7069a4018a9d8f7aafb06a57e3 · h2oai/sparkling-water · GitHub) Maybe try that instead?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.