Greetings and hoping to learning something more!

Hi! my name is Beto,

I am geoscientist who love to teach Basic R and EDA with tidyverse and ggplot.

I am giving the lectures at my institution (a federal institution for statistics and geography) and I joined this community because I hope to learn more about this tools and tidyverse is the central tool for data munging, cleaning and visualize data I use.
I would like to see a centralized repository of exercises for each main package of tidyverse, maybe with our contributions, do you agree on that??

Warm regards!

1 Like

Have you checked out the exercises in R for Data Science? (Available here )

It sounds highly possible that there may soon be a crowd-sourced solutions manual for these exercises as well -- thanks to Jesse Maegan's awesome learning community (See an article here)

3 Likes

I second the recommendation to look at R4DS. I ran a two-day training session at my work on the tidyverse for some colleagues that's largely lifted from the exercises and chapters from R4DS and this tutorial from the Harvard Institute for Quantitative Social Science. I put the slides up on my github; I'd recommend cloning the repo in RStudio and running the slides there if you'd like to check them out. Have fun!

3 Likes

Hey @khturner! I've actually been trying to scale up a bunch of side-of-desk trainings I've been doing at work. Thanks for sharing your slides -- this looks like a great resource! I may borrow some of it. Any other tips from your experience?

Totally! :slight_smile:

I've found that the concepts of tidy data are much easier to teach with simple, fun datasets, rather than datasets that might be more applicable to your domain. For example, even though I worked to learn R to help me with my genomics work, it was the dplyr tutorial with nycflights13 that really helped me understand the principles underlying tidy data organization and what that enables you do to. So when I help train people at work, I start off trying to find fun data sets to get people excited to play around with. In particular, I've found Kaggle and data.world to be really great places to find fun and interesting data sets. The slides I linked have a few of those already, like Eurovision voting, UFO sightings, etc. There's an amazing Bigfoot sightings dataset on data.world that I've always wanted to try out some tidy text techniques on.

The other thing I've found to be really helpful is to introduce some collaboration and possibly some competition. We're really lucky to have a big team of fun, friendly contributors, so we try to get people whipped up about data science and get them excited to grab the ball and run with it on their own. We did a ggplot2 visualization contest around the Eurovision data (sharing plots on Slack and declaring the winner by :+1:'s), and for more advanced folks, we recently had a big competitive prediction challenge (like an internal Kaggle competition).

I hope this is helpful! I've really found that spending the time promoting the tidyverse and the associated data science philosophy within our group has paid off in helping other teammates be more productive and us all speaking a common language with our code and analysis techniques. Good luck!

4 Likes

Thanks so much! These are really great tips. My goal is definitely to get to the "common language" state that you mention, so it's amazing to hear such a success story and learn more about your approach.

I can completely agree with you about the concept of tidy data. I've written some packages to help users manipulate cash flow statements like tidy data, so that seemed like a natural way to introduce the concept. Ultimately, though, a domain-agnostic example seems a lot better at highlighting the concept without distraction people by context.

(Sidenote -- the dplyr vignettes with the nycflights13 data are some of the cleanest exposition of "what this code does and how to use it" that I have ever seen!)

Thanks to Emily and Keith for the answer and contributions.

I already use some exercises from R4DS among others to produce some exercises in order to teach EAD and visualization, I was mostly thinking in our own contribution, datasets package have a lot of data to explore as well as other open sources.

Many times I took the idea of an exercise and reproduce with different data or using different variables.
Anyway I strongly agree with Keith that you must capture the learner's attention and interest with an attractive graphic in order to deeper explore after the tool presented.

I also have my Github repository for the lectures at "HumbertoSubiza", but unfortunately the documentation is in Portuguese ! (I will make some translation to English as soon I have time for that! ...).
Thks again!