Modify sql tables when reading into R Studio

Hello all, I'm attempting to connect to a sql dataframe to Rstudio while modifying the datatable while it reads into R. I know it is possible to do this SAS, so I wrote something similar to what I want to be able to do below.
FYI I am using SparkR.

                     tableName = "(
                 CREATE TABLE DF AS
                  SELECT VARIABLE,
                   WHEN VARIABLE1= 0 THEN 10
                      ELSE  VARIABLE1
                      END AS VARIABLE1 LABEL = 'Var1',
                      VARIABLE 2 AS VAR2 LABEL = 'Var2'
                   FROM SQLTABLE,
                      where VARIABLE3 IN ('17', '27', '33', '39','40', '41') AND VARIABLE1 >= 0
                     , driver='drivername',
user = 'username',
password = 'password') 

Thanks in advance :slight_smile:

Not sure how to do this in SparkR but in sparklyr you would so something like the following, notice that you need a JDBC driver properly installed in your cluster, this example uses the MySql connector but there are many more that might suit your needs.

sc <- spark_connect(
  master = "local",
  config = list("" = "~/Downloads/mysql-connector-java-5.1.41/mysql-connector-java-5.1.41-bin.jar")

spark_read_jdbc(sc, "person_jdbc",  options = list(
  url = "jdbc:mysql://localhost:3306/sparklyr",
  user = "root", password = "<password>",
  dbtable = "person"))

Instead of passing a SQL statement as the dbtable, I would rather register the tables you need to access in Spark individually and then join tables as needed.

Additionally, you can connect to data sources directly from R using the odbc package without having to use Spark. For RStudio Pro customers, you can consider also using the RStudio Professional Drivers.


This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.