Questions about "Introduction to Statistical Learning" and about Machine Learning Generally

imjtrial · August 4, 2018, 10:43am

Originally split from this disucssion: Books on machine learning

ISLR is the most popular pick here as well as virtually any others forums/sites.

However, I haven't encountered in depth review "WHY" ISLR is that good.

It's interesting to see someone claim the word "learning" was just a hype that has nothing to do with learning and the whole thing is just pure statistics. I kind of agree because human review calculated result then improve the flow is still pretty much human learning, not ML.

Any details about why this book is a well fit for ML or why this book is better than some Statistics classic like Statistics by Freedman is welcome!

Edit: I really try to understand what ML is but my impression is "machine" just means "automation" which has nothing to do with "learning"

Thanks!

taran · August 6, 2018, 7:18pm

I have read quite a few books on statistics/ML and this one is probably my favorite.

It’s just well written, has a good mix between math and explanations, and includes lots of well made and helpful illustrations.

Note that even though there is quite a bit of statistics the focus of this book is primarily predictive modelling, not causal inference (unlike many classical books on statistics).

robertmitchellv · August 8, 2018, 4:50pm

The reason I think people love it so much is (probably) for really practical reasons; here are a few I can think of (this is just some armchair speculation):

The book if free, which is rare for quality statistics books (RIP wallet if you want more)
The lectures are on YouTube and are also free
Although the math looks intimidating, they do a pretty good job breaking down what is happening in the equations--it's just slow going and continues to build upon what was previously covered (Elements of Statistical Learning is a lot more dense--I couldn't really get into it since my math is not as strong)
Connected to the above, the book can feel a tad redundant in its explanations if you read in a hurry, but the language is incredibly specific with an effort to not ignore anything
It's currently in its 7th edition, and has been taught in universities for a while, so, I think often people end up teaching what they learned--especially if they found it helpful
7th edition also means that it is continuing to become more clear as a teaching resource
Trevor Hastie and Rob Tibshirani are experts in this subject
The book covers a lot--you can be pretty well rounded by going through it, which is pretty nice--one book and you can build the foundation to understand and access more advanced content

This is just my personal take: I think this book is the gateway drug. It helps lay a good foundation for future exploration in statistical learning, which I think is a good way to describe the statistical modeling approaches many people use in data science. I would contrast Statistical learning from Linear Algebra based approaches that often get thrown together in a blender when people employ jargon like ML. I found it incredibly difficult to get going because I didn't understand how different some of the algorithms people used were.

I'm working on getting some of my notebooks in a form that isn't too embarrassing to put online where I'm going through the book and trying to use tidyverse tools to accomplish things, which is a great way to learn as well!

pavopax · August 8, 2018, 5:01pm

I agree with all of the above. It's one of those rare book to buy as a hard copy.

Stuck on something? Just read the relevant section and you'll get a clear and comprehensive summary (searching a few stack overflow questions, etc, will not always give you the full picture).

To me, "learning" is about an algorithm "learning" patterns in data via "training", so that it can "predict" similar patterns in new, unseen data.