I am using the DBI/ODBC package to connect to our Database (Hadoop-HIVE).
To send and write R Data.frame to a sandbox in the Hadoop / HIVE database, I am using dbWriteTable (), but it is very slow.
To send a file of 1.3Gbytes, the transfer time reaches 18 hours.
Our structure is as follows:
Server Linux - R / RStudio Server Pro -> Server Linux - Hadoop / Hive.
Do you have any advice on best practices or some other function to use?
Is it possible that I have an infrastructure problem between the two servers?
Maybe I have to switch from the Hive connection to the Impala connection in the dBwritetable process, for better performance to write to Hadoop?