How to Connect to Kerberized Livy With RStudio

Hello,

We are trying to connect our RStudio to our kerberized HDP cluster and happen to encounter some errors while doing so. Currently, I have an RScript to connect to livy on another host but I receive "Failed to connect to Livy service at http://XXX:8999. Livy operation is unauthorized. Try spark_connect with config = livy_config()"

The code I am using is:

library(sparklyr)
library(dplyr)
config <- spark_config()
config[["livy.rsc.rpc.max.size"]] <- 1048576000
config[["spark.driver.memory"]] <- "4G"
config[["spark.executor.memory"]] <- "4G"
liv_conf <- livy_config(config = config, username = "XX@X.ca", password = "P@SS")
sc <- spark_connect(master = "http://server:8999", method = "livy", config = liv_conf)

Any help is much appreciated.

We were able to make a connection by adding Livy by performing the following:

  1. Add Livy to knox gateway in kerberized cluster (Ambari). Instructions to add livy to knox are here, but instead of using the service.xml and rewrite.xml they use I would use the knox provided one fro Github found here. Restart Knox after changes are applied.

  2. You'll have to add the livy service to Ranger to allow access to a given set of users

  3. In Ambari you'll have to have knox to the livy superusers to get proxy user impersonation to work. Essentialy, this will allow livy sessions to be created as the proxy user (user123) instead of "knox".

  1. In R use the following script to connect:
# install.packages("openssl")
# install.packages("xml2")
# install.packages("sparklyr")


library(sparklyr)
library(dplyr)


s_config <- spark_config()


l_config = livy_config (
  config = s_config,
  username = "USERNAME",
  password = rstudioapi::askForPassword("Livy password:"),
  negotiate = TRUE,
  proxy_user = "USERNAME(append domain if applicable)"
)

sc <- sparklyr::spark_connect(master = "https://knox-url:8443/gateway/default/livy/v1/",
                              # //Spark has to access knox to forward it to Livy
                              version = "2.2.0",
                              method = "livy",
                              config = l_config)

query <- " select 1"
result <- dbGetQuery(sc, query) 
print(result)

Hopefully this helps someone in the same situation :slight_smile:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.