Writing dataframe to Hive Data Base

We're going to be getting access to our own hive/impala database here soon. We currently use Talend to export csv's of the output of our R scripts into Hive, but we were hoping to cut out some of the middleware here and just write straight into Hive, so we decided to experiment with writing dataframes in R straight to Hive once we procured a development environment to play in. After overcoming some security hurdles, I was able to verify that I had the correct permissions to create a table in our Hive database using Hue (sql connector for Hive we use).

Then when I tried to write a simple dataframe out to Hive from R, I encountered errors every time I tried. I primarily was testing using the ODBC and DBI packages, and was using dbWriteTable (I tried using both the ODBC and DBI versions of dbWriteTable). From the time I spent looking online, I couldn't seem to find another package that would do what I wanted besides using dbWriteTable.

I tried two different exercises, inputting data into a Hive table I had already created, and creating the Hive table in R using dbWriteTable and then having the function insert the data into the table it created. I got different errors depending on which I was doing.

First I tried to write to a table that was already built in Hive, using the overwrite = TRUE option in dbWriteTable.

Code

test1 <- DBI::dbWriteTable(db_connect, "aa_poc.test4", test, overwrite = TRUE)

Error:

Error: 'CREATE TABLE aa_poc.test4 (
market STRING)'
nanodbc/nanodbc.cpp:1587: HY000: [Cloudera][Hardy] (35) Error from server: error code: '1' error message: 'Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. AlreadyExistsException(message:Table test4 already exists)'.

So then I decided to try and have dbWriteTable create the table and then insert the data from the dataframe into the newly created table, but then I got this error:

Code

test1 <- DBI::dbWriteTable(db_connect, "aa_poc.test33", test)

Error

Error: Error in new_result(connection@ptr, statement) :
nanodbc/nanodbc.cpp:1344: 42S02: [Cloudera][SQLEngine] (31740) Table or view not found: HIVE..aa_poc.test33

Based on that error, I gather it didn't create the table, yet when I would go into Hive itself and check I would find the table.

Has anyone else tried to write dataframes from R straight to Hive? If you have, I'd love to hear how you did it!

Hi, you may want to try out the solution in this article: https://db.rstudio.com/best-practices/schema/#write-non-temporary-tables