How to Connect RStudio Server Pro to HDP Hive

Hello,

We are trying to connect our RStudio to hive using the following code:

install.packages("rJava")
install.packages("RJDBC",dep=TRUE)
options( java.parameters = "-Xmx8g" )
library("DBI")
library("rJava")
library("RJDBC")

cp = c("/usr/hdp/current/hive-client/lib/hive-jdbc.jar",
"/usr/hdp/current/hadoop-client/hadoop-common.jar")
.jinit(classpath=cp)

drv <- JDBC("org.apache.hive.jdbc.HiveDriver",
"/usr/hdp/current/hive-client/lib/hive-jdbc.jar",
identifier.quote="`")

conn <- dbConnect(drv, "jdbc:hive2:<SERVER_NAME>", "user", "pass")

show_databases <- dbGetQuery(conn, "show databases")

show_databases

Currently, I get "java.lang.NoClassDefFoundError: org/apache/thrift/TException", however previously when I reinstalled the packages I got "java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback"

This seems like an issue with the library not being loaded in the classpath which I am currently debugging. Is there another method to connect RStudio to Hive or has anyone encountered a similar difficulty?

Thanks!

Another option would be to use ODBC instead of JDBC. Here are a couple of links that may be of help on how to set that up:

1 Like

Thanks edgaruiz, I'll try that out.

This was fixed by having the user first kinit (were using kerberos) and then executing the following code:

# install packages("DBI")
# install.packages("rJava")
# install.packages("RJDBC",dep=TRUE)
# install.packages("odbc")

library(DBI)
library(rJava)
library(RJDBC)

print("Attempting Hive Connection...")

hadoop.class.path = list.files(path=c("/usr/hdp/current/hadoop-client"),pattern="jar", full.names=T);
hive.class.path = list.files(path=c("/usr/hdp/current/hive-client/lib"),pattern="jar", full.names=T);
hadoop.lib.path = list.files(path=c("/usr/hdp/current/hadoop/lib"),pattern="jar",full.names=T);

mapred.class.path = list.files(path=c("/usr/hdp/current/hadoop-mapreduce-client/lib"),pattern="jar",full.names=T);
cp = c(hive.class.path,hadoop.lib.path,mapred.class.path,hadoop.class.path,hadoop.common.path)
.jinit(classpath=cp)

drv <- JDBC("org.apache.hive.jdbc.HiveDriver","/usr/hdp/current/hive-client/lib/hive-jdbc.jar",identifier.quote="`")

conn <- dbConnect(drv, "jdbc:hive2://JDBC-provided-by-ambari-server")

show_databases <- dbGetQuery(conn, "show databases")

print("Connected.")
print(show_databases)

Hopefully this helps someone else :slight_smile:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.