Hello all, I'm attempting to connect to a sql dataframe to Rstudio while modifying the datatable while it reads into R. I know it is possible to do this SAS, so I wrote something similar to what I want to be able to do below.
FYI I am using SparkR.
read.jdbc(url="jarfilename",
tableName = "(
CREATE TABLE DF AS
SELECT VARIABLE,
case
WHEN VARIABLE1= 0 THEN 10
ELSE VARIABLE1
END AS VARIABLE1 LABEL = 'Var1',
VARIABLE 2 AS VAR2 LABEL = 'Var2'
FROM SQLTABLE,
where VARIABLE3 IN ('17', '27', '33', '39','40', '41') AND VARIABLE1 >= 0
)"
, driver='drivername',
user = 'username',
password = 'password')
Not sure how to do this in SparkR but in sparklyr you would so something like the following, notice that you need a JDBC driver properly installed in your cluster, this example uses the MySql connector but there are many more that might suit your needs.
Instead of passing a SQL statement as the dbtable, I would rather register the tables you need to access in Spark individually and then join tables as needed.
Additionally, you can connect to data sources directly from R using the odbc package without having to use Spark. For RStudio Pro customers, you can consider also using the RStudio Professional Drivers.