Hi guys! I recently decided to refresh a bit some of my university learnings on statistics and have been looking for good books. Unfortunately those those that I was learning from weren’t especially practice orientated - I’m looking for ones that talk about stats from a more data science, practical point of view and blend nicely some of the statistical concepts with machine learning. Would you have anything good to recommend? Thank!
Note a book but I really enjoy watching the Opinionated Lessons On Statistics series of videos. Someday I’d like to make a companion package to reproduce the examples with R code.
“An Introduction to Statistical Learning” is exactly that, an intro, while “Elements” goes a lot deeper on the same concepts.
Personally, I think that Applied Predictive Modeling does a really fantastic job of balancing theory and practice, and gives you a ton of re-workable examples in R using the caret package.
I completely agree: Applied Predictive Modelling is definitely my number 1. I read “An Introduction to Statistical Learning” and use “Elements” more as a go to reference when I need to check something cause that one is really a biggie. I completely agree that all 3 are really awesome but more for ML purposes, whereas I’m looking for something slightly more pure stats orientated. Describing things such as: estimators theory, hypothesis testing, confidence intervals, power and sample size etc. Anything else you can recommend?
It’s a little harder to read and pretty theoretical but Statistical Inference by Casella & Berger has been fairly common among graduate programs for years.
Also, it’s ridiculously expensive on Amazon, I got mine for ~$30 on Ebay.
I’m a big fan of Richard McElreath’s Statistical Rethinking. It provides a great intro to Bayesian statistical applications, with lots of practice problems. No dedicated section on machine learning, though.
I second Richard McElreath’s Statistical Rethinking. I also like Andrew Gelman and Jennifer Hill’s Data Analysis using Regression and Multilevel/Hierarchical Models. I believe Gelman is working on a second edition that uses Stan/RStan. No machine learning though.
Wasserman: All of Statistics
If you are slogging through a stats class (with Casella and Berger as your textbook, as above), then grab this and it will give you the basics in a clear manner.
+1 for Mostly Harmless Econometrics. IMHO one of the most underappreciated gems in this genre. Econometrics hasn’t reached the same buzzword status as machine learning or data science, but it brings a really valuable perspective thinking “what would be my ideal data and experimental design for this?” as a tool to think about how to approach a problem with whatever actual data and information you have
Highly recommend Statistics by Freedman, Pisani and Purves as a first statistics text. Clearest and easiest to read math book I’ve ever found. Mathematical details at about a highschool level, and it really does wonders for intuition. Also it’s great to share with family members.
for Statistics by Freedman, Pisani, and Purves as well, especially for undergraduates.
Yeah I had a real hard time with both Casella and Berger and Gelman until I had a semester of probability under my belt.
Richard McElreath’s Statistical Rethinking
I’m reading that now. It’s very good.
Statistics for Experimenters (aka BHH) is one of my favorites. The first edition is a little more concise but is probably more difficult to find.
I just started working my way through Applied Predictive Modeling a couple weeks ago so excited to see it recommended so many times!
My recommendation might be a bit off from what you’re asking for but I loved Naked Statistics by Charles Wheelan because reading it was the first time I felt like I really would be able to learn statistics and helped give me the motivation/courage to crack more intimidating textbooks to get into the details.
In addition to the classics, of Introduction to Statistical Learning in R and Elements of Statistical Learning, I also recommend the newer entry from Hastie, Computer Age Statistical Inference. I haven’t finished CASI – only read a few random chapters – but I really like how it is laid out, with focus on not just the math, but also the history. It’s a great way to introduce some of the statistics in data science and help explain how the field has grown into what it is today.
If you are not 100% focused on using R and open to learning through Python, I also highly recommend the Allen Downey books Think Stats 2 and Think Bayes. They are well written and favor teaching through code instead of just math, which was really helpful for me.
Lastly, I thoroughly enjoyed Machine Learning for Hackers and its corresponding GitHub repo. It’s a whirlwind tour of the most common/basic algorithms used in data science (outside of deep learning) and is focused more on making sure you understand the high-level concepts and how to use them than making sure you understand the math. In that regard, it’s a great companion book to ISLR/ESL.
Thank you so much guys! I think everyone in this group will find the best content for his/ herself. Personally I will start either with this one: “All of Statistics: A Concise Course in Statistical Inference” or “Statistical Rethinking: A Bayesian Course with Examples in R and Stan” just to get all the foundations right. I think " Statistics by Freedman, Pisani and Purves" would be a good choice but it has twice as many pages so I’d rather take a shortcut here
If you’re ever stuck and need something explained in a new way, I really like Introduction to Probability Theory and Statistics by Javier R Movellan.
Among other things, it’s free and online ! I often use it as my go-to for explaining something that might require a quick stats refresher.
Just want to give a plug for one of my all time favorite statistics books, Richard Royall’s Statistical Evidence. I read this very early in my career and it had a profound effect on the way that I think about statistics and data analysis. While it’s easy to get sucked into arguments about likelihood vs. Bayesian vs. frequentist, I think the overall message of the book is nevertheless very interesting.
While it’s easy to get sucked into arguments about likelihood vs. Bayesian vs. frequentist
Are you saying that Statistical Evidence avoids the Bayesian vs frequentist debate, or that it indulges it but is a worthwhile read anyway?
Either way I’ll probably check it out, so thanks for the suggestion. Just want to know what I’m getting myself into