Information for the "Big Data with R" class



Dear class attendee,

First, allow me to thank you for filling out the pre-class survey, it has yielded a lot of great insights. We plan to share with you the aggregate results during the class.


We plan to provide a personal server to each student for use during the class. The server will contain all of the applications and materials needed, including R and RStudio. All you will need is a laptop with a web browser. For those of you that need to use their work provided laptops for the class, please ensure that the web browser in it will not be prevented from navigating to Amazon AWS, which is where the servers will be set up.

Helpful reading

Some have asked for material that would be useful to review prior to the class. The following is a compilation of subjects would be great if you are familiar with already by the time the class begins, but it is not a requirement that you study or review them.

It was great to see that most respondents of the survey are daily dplyr and ggplot2 users. For those who are not, it would be a good idea the following chapters of the R for Data Science book:

For database background, please review the articles in the following links:

For spark background, please review the following:

For those who are not to experienced with Shiny, please review the articles in the following links:

Thank you for choosing this class, and I look forward to meeting you!



Hello all,

In a recent email, we mentioned that this page contains pre-requisite resources. Please consider the links in the Helpful reading as suggested reading. It’s ok if you don’t have the time to review all or any of the info in those links prior to the class. We plan to cover most of the material during the class.

Thank you.


Howdy all - I think @edgararuiz will have more information to add, but I wanted to share the link to the github repo in case you want to download the materials! You can either use git clone or directly download a ZIP file. Enjoy!