Hi dear experts
Our system landscape is that we are using RStudio Server 128 GB RAM R4.4xLarge AWS EC2 instance running ubuntu. We also have a SAP HANA Database holding tables of 12,000,000 rows by 100 Columns. We are trying to use RODBC and sqlQuery To load the data from this table into a dataframe on our RStudio Client. Could you tell me if this is a normal situation and how to optimize it? Currently we are experiencing a weird situation where the command starts to run and never finishes it. When we kill the command with the stop button, and then dim(table) and view(table) we get the correct results.
By the way, It seems like 6 rows are missing from the sqlQuery. Our Hana Table has 23,581,917 rows, but our View(table) has 23,581,911 rows (after stopping the sqlQuery command running for 3 hours).
This is the full reproduction instructions:
1)The Materialize Table "MT001_LABORATORY_RESULT" (DiskSize=574,070,784, Rows 23,581,917) on SID=C01 HANA database 2.0 SPS02 with 64 GB RAM.
2) The C01 table data is accessed via Smart Data Access from USA HANA and stored as virtual table in USA (VT001_LABORATORY_RESULT)
3) a view of USA virtual table is made in the same schema (V001_LABORATORY_RESULT)
4) a calculation view in package "p8118 is created. The view put into a projection, then propagated to aggregation, then propagated to semantics. (CV001_LABORATORY_RESULT)
5) the calculation view is used as a external facing access point to the data by RStudio server on ubuntu(122 GB ram, 16 CPUs, 10Gbps Network)
6)RStudio server web-based client uses the following code to obtain the data into a dataframe:
options(encoding = "UTF-8")
con<-odbcConnect("USA", uid="D000001" ,pwd='', DBMSencoding = "UTF-8")
lab<- sqlQuery(con,'select * from "_SYS_BIC"."p8118/CV001_LABORATORY_RESULT"')
7a) This sqlQuery Statement will not finish in Rstudio and the command is force stopped.
7b) the sqlQuery for smaller tables 1.8M records, using the same networking route, will succeed in about 3 to 5 minutes.
8a) when we run dim(lab) after stopping the lab<-sqlQuery command, we can get the almost the correct rows and columns 23,581,911 columns x 43 rows.
8b) The rows in Rstudio are 6 less than the actual table rows on C01.
9) when we run view(lab) after stopping the lab<-sqlQuery command , we can preview all the rows of data.