Introduction to Data Science in the Tidyverse Workshop - rstudio::conf 2020

Introduction to Data Science in the Tidyverse Workshop

9:00 AM-5:00 AM
2 Day Workshop

Hadley Wickham
Chief Scientist

Amelia McNamara
Assistant Professor of Computer & Information Sciences
University of St Thomas

This is a two-day, hands-on workshop designed for people who are brand new to R & RStudio and who learn best in person.

You will learn the basics of R and data science, and practice using the RStudio IDE (integrated development environment). We'll discuss much of the material from the book R for Data Science, including data visualization (ggplot2), data transformation and tidying (dplyr, tidyr), understanding special data types (stringr, forcats, lubridate), and modeling (broom). Throughout the workshop, we'll work in RMarkdown documents, and learn best practices for data computing.

If you want to transition from coding in base R to the tidyverse, or just jump into doing data science in the tidyverse without any prior R experience, this is the workshop for you! We will have a team of TAs on hand to show you the ropes, and help you out when you get stuck.

To know whether this workshop is right for you, consider these questions:

  1. You have a dataset of prices of diamonds, as well as their size. Could you make a scatterplot of the two variables using ggplot2?
  2. You have two datasets, one with information on music genres and age ranges, the other with genres and radio station call names. Can you imagine how you would join them together with a dplyr verb?
  3. We want to model the wages of people in the United States, using their height and education as predictors. Then, we would like to plot model predictions for each level of educational attainment. Can you imagine how to do this in R?

If you answered "no" to any or all of those questions... great! This workshop is for you. By the end of the two days, you should be able to accomplish all those tasks. If you answered "yes" to all three questions, you may want to consider taking a different workshop.

Hi folks!

Hadley and I are looking forward to seeing you next week for our workshop, Introduction to Data Science in the Tidyverse. This is a workshop for people who are brand new to R and RStudio, so we don't expect you to have done any pre-preparation before you arrive.

We'll be using RStudio Cloud, a cloud-based version of R and RStudio available through your web browser. So (all going well) on the day of the workshop all you'll need is a laptop that can access the internet and your power cord. Wifi will be available. Please sign up for a free account on RStudio Cloud. You can make an account directly on RStudio Cloud, or use single-sign-on with a service like GitHub or Google.

In the unlikely event that there are problems with the conference internet connection, you may want to have a local installation on your computer as a backup. If you'd like, install the following:

  1. A recent version of R (~3.6), which is available for free at
  2. A recent version of RStudio IDE (~1.2.5033), available for free at
  3. The set of relevant R packages, which you can install by connecting to the internet, opening RStudio, and running: install.packages(c("babynames", "fivethirtyeight", "formatR", "gapminder", "hexbin", "mgcv", "maps", "mapproj", "nycflights13", "rmarkdown", "skimr", "tidyverse", "viridis"))

Again, this is not required, it's just for backup in case we have issues with connectivity.

The workshop materials will be provided to you via RStudio Cloud when you arrive, but if you want to poke around ahead of time, we are hosting everything on GitHub. Materials at that link will continue to change for a few days.

Please let us know if you have questions, and we'll see you at 9 am on Monday, January 27 in Plaza B of the conference hotel!