Modify sql tables when reading into R Studio

Hello all, I'm attempting to connect to a sql dataframe to Rstudio while modifying the datatable while it reads into R. I know it is possible to do this SAS, so I wrote something similar to what I want to be able to do below.
FYI I am using SparkR.

read.jdbc(url="jarfilename", 
                     tableName = "(
                 CREATE TABLE DF AS
                  SELECT VARIABLE,
case 
                   WHEN VARIABLE1= 0 THEN 10
                      ELSE  VARIABLE1
                      END AS VARIABLE1 LABEL = 'Var1',
                      VARIABLE 2 AS VAR2 LABEL = 'Var2'
                   FROM SQLTABLE,
                      where VARIABLE3 IN ('17', '27', '33', '39','40', '41') AND VARIABLE1 >= 0
                     )"
                     , driver='drivername',
user = 'username',
password = 'password') 

Thanks in advance :slight_smile:

Not sure how to do this in SparkR but in sparklyr you would so something like the following, notice that you need a JDBC driver properly installed in your cluster, this example uses the MySql connector but there are many more that might suit your needs.

sc <- spark_connect(
  master = "local",
  config = list("sparklyr.shell.driver-class-path" = "~/Downloads/mysql-connector-java-5.1.41/mysql-connector-java-5.1.41-bin.jar")
)

spark_read_jdbc(sc, "person_jdbc",  options = list(
  url = "jdbc:mysql://localhost:3306/sparklyr",
  user = "root", password = "<password>",
  dbtable = "person"))

Instead of passing a SQL statement as the dbtable, I would rather register the tables you need to access in Spark individually and then join tables as needed.

Additionally, you can connect to data sources directly from R using the odbc package without having to use Spark. For RStudio Pro customers, you can consider also using the RStudio Professional Drivers.

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.