Spark (sparklyr) running in local with rstudio cloud


#1

I’m trying to reproduce the same code that I test in Rstudio desktop on Rstudio cloud. I have a little example to test how to use sparklyr in a local spark (installed like a package).
It works properly in Rstudio desktop, but It doesn’t work with Rstudio cloud.
When I run the R code, the connection with spark local doesn’t work and I can not continue.
This is my simple code:

install.packages("sparklyr")
library(sparklyr)
spark_install(version = "2.1.0")
devtools::install_github("rstudio/sparklyr")
library(sparklyr)
install.packages(c("nycflights13", "Lahman"))
library(nycflights13)
library(Lahman)
library(dplyr)

sc <- spark_connect(master = "local")

iris_tbl <- copy_to(sc, iris)
flights_tbl <- copy_to(sc, nycflights13::flights, "flights")
batting_tbl <- copy_to(sc, Lahman::Batting, "batting")
src_tbls(sc)
flights_tbl %>% filter(dep_delay == 2)

Are there problems to support this?
The code works properly in RStudio Desktop.

Thanks
Alfonso Carabantes


#2

Installing sparklyr via devtools is running into libssh-2 missing. That should be fixed next week or the week after.

When using the CRAN version of sparklyr, I am able to load iris.
My guess, but not totally confirmed, is that some of the others are running into the limits of the memory available.


#3

Thanks Josh, I think that it can be a timit with memory. Do you know how much max memroy is asigned to an rstudio cloud user?
Will you let to run this type of package so memory heavy? I think perhaps it is a too much for a free cloud account!

Thanks
Alfonso Carabantes


#4

Alfonso,

Currently we’re limiting users to 1GB of memory. We have plans to add options for more memory, but we’re still in the process of deciding how that will work.

Best,
-Andy


#5

Ok Andy,
Thanks for You response.
I’ll wait tour nrw festure about the memory

Regards
Alfonso