I’ve found that the concepts of tidy data are much easier to teach with simple, fun datasets, rather than datasets that might be more applicable to your domain. For example, even though I worked to learn R to help me with my genomics work, it was the
dplyr tutorial with
nycflights13 that really helped me understand the principles underlying tidy data organization and what that enables you do to. So when I help train people at work, I start off trying to find fun data sets to get people excited to play around with. In particular, I’ve found Kaggle and data.world to be really great places to find fun and interesting data sets. The slides I linked have a few of those already, like Eurovision voting, UFO sightings, etc. There’s an amazing Bigfoot sightings dataset on data.world that I’ve always wanted to try out some tidy text techniques on.
The other thing I’ve found to be really helpful is to introduce some collaboration and possibly some competition. We’re really lucky to have a big team of fun, friendly contributors, so we try to get people whipped up about data science and get them excited to grab the ball and run with it on their own. We did a
ggplot2 visualization contest around the Eurovision data (sharing plots on Slack and declaring the winner by 's), and for more advanced folks, we recently had a big competitive prediction challenge (like an internal Kaggle competition).
I hope this is helpful! I’ve really found that spending the time promoting the tidyverse and the associated data science philosophy within our group has paid off in helping other teammates be more productive and us all speaking a common language with our code and analysis techniques. Good luck!