connecting sparklyr to mysql

Hi everyone,

I am having trouble connecting MySQL to spark, possible reasons are, for instance, java version, java files location, connector files location, MySQL version, environment variable location, the use of jdbc or odbc, and so on. My questions are:

  1. Do we need to install hadoop and java before installing sparklyr? I am using R base, not Rstudio.

  2. Which version of each of these package are stable for successful installation and connection, if anyone had any possible experience? (the solutions online might worked on older version of these packages, but seems not working anymore in my case, I’m on mac by the way).

  3. So far, the only way I tried successfully is to utilize the sqldf package on SparkR to connect MySQL, but I am not sure if spark was working (to speed up the process) when I run the sql queries with sqldf package on SparkR. Can I do the same with sparklyr? Then how do I know if it is spark that is working behind or the R that is working behind?

I hope I described my questions clearly. Thank you very much for the help.

Best regards,

YA

Hi everyone,

Just to follow up, I have "kind of" solved the problem. The reason that it is only "kind of" solved is, there is a weird thing happen, could you give me some advice please? Thank you very much. So the code looks like below:

library(sparklyr) # load sparklyr package
sc=spark_connect(master="local",spark_home="/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7") # connect sparklyr with spark, only works with JDK8
jdbc.config=spark_config()
jdbc.config$'sparklyr.shell.driver-class-path' ="/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7/jars/mysql-connector-java-8.0.16.jar" # put the jdbc connector under spark_home/jars/ folder
query1=spark_read_jdbc(sc,name='student1',options=list(url='jdbc:mysql://localhost:3306/learnsql',user='root',password='ya',dbtable='student1'))

The weird thing is, the spark_read_jdbc() has to be run TWICE to get this work. The first run get the error: Error: java.sql.SQLException: No suitable driver among other info, and the second run works. Anybody knows why sparklyr behave like this?

Thank you very much.

Best regards,

YA

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.