Spark vs. Rstudio on AWS, R/S3 package?

Three (3) questions:

When using Rstudio on AWS, does the tidyverse support AWS S3 IO, or any R packages available for S3?

Does Rstudio Pro on AWS support elastic ec2 for big data workloads?

What are the economics(cost difference) between a distributed (PySpark) workload vs. same workload on a monolithic elastic EC2 instance?

Thanks

I see the aws.s3 package ...

For Apache Spark you have (at least, there are probably more) sparklyr/dplyr, sparklyr/rquery, or sparkr/rquery. For monolithic applications I suggest getting a big memory instance and using data.table.

2 Likes

Hi!

With regards to the questions about the AWS Marketplace offering: the libraries included are intended to be a starting point -- essentially a way to save you some extra time installing/compiling well-used libraries. You're certainly welcome to install libraries for working with AWS as you see fit, there's nothing unique about the tidyverse version installed on that offering.

Does Rstudio Pro on AWS support elastic ec2 for big data workloads?
This offering is an AMI you can start/stop/terminate instances using at any scale or for any duration. There isn't any functionality of the offering that would allow for scaling out workloads, you'll need to handle ec2 scaling based on your needs.

Thanks!