Book recommendations - statistics in R

bragks · July 12, 2018, 6:28pm

Hi
I'm new to R (and statistics), and I love reading books when I learn new stuff. I've read Hands on Programming with R and I'm halfway through R for Data Science. I think they're both great and would absolutely recommended them to someone beginning with R, but the main reason I'm learning R at the moment (except for needing an excuse to learn programming) is statistics for my phd. Long-term I'd like to use it for machine learning in medical imaging.

What I've learned this far has already been tremendously helpful, however, I can't seem to find a book that actually covers statistics from A-Z using tidyverse-based methods. I have a feeling ModernDive at some point will evolve into what I'm looking for, but I need it NOW.

In other words, I'm looking for a book that explains statistics and "how-to-do-R-things" with everything from basic stuff like measures of centre, distributions etc to advanced regression methods using a tidyverse-based approach. Is there such a book? If not, what are the second best options?

Thank you!

tbradley · July 12, 2018, 7:01pm

I don't know about one written with tidyverse approaches but you should definitely check out Intro to Statistical Learning and if you are looking for machine learning, you could check out Applied Predictive Modeling

jcblum · July 12, 2018, 11:16pm

I don't know if what you're hoping for exists either. Part of what's complicated is that there really isn't much agreement on what "covers statistics from A-Z" means between different stats-using disciplines (or between statistics-the-discipline and applied statistics as a whole). Whether machine learning even belongs in the same category as "statistics" has been a matter of debate (arbitrary example, from the top of my google: Machine Learning vs Statistics - KDnuggets).

That said, we have this thread about people in this community's favorite "pure statistics" books, which isn't a direct answer but there's some pretty great stuff in there:

bragks · July 13, 2018, 8:18am

ISLR is on my read list and it looks great, but it's a bit advanced for my current level.

g-thomson · July 13, 2018, 9:09am

Generally, statistics textbooks are not exactly page turners and the good ones tend to be more specialised as opposed to introductory. Instead I would recommend doing one or two of the introductory MOOCs on sites like Coursera (e.g. https://www.coursera.org/specializations/statistics). Then read blogs online of people using the techniques on interesting data (check out R-bloggers).

When you feel you have a grasp on what the statistical concept you are studying does you could then try implementing it in the tidyverse idiom. For this check out the broom package which takes the output of models and puts them into tidy data frames and the twidlr package which restructures several common model functions to work with pipes.

Lastly since statistics can seem quite abstract. Plot what you are doing as much as possible to develop an intuition about what is occuring.

mara · July 13, 2018, 1:30pm

The Elements of Statistical Learning is legendary and great, too (and freely available online).
https://web.stanford.edu/~hastie/ElemStatLearn/

The same is true for Computer Age Statistical Inference (another Hastie book)

As a nice resource for finding resources, there's a compendium of books on DataSciGuide with reviews, difficulty levels, etc:
http://www.datasciguide.com/contenttype/book/

Mark6 · July 13, 2018, 2:20pm

I've dipped in and out of Practical Statistics for Data Scientists 50 Essential Concepts as a relatively useful reference book.

It also contains R code snippets to give programming examples of the concepts (though I don't think they are the best examples, they aren't an introduction to R so you need some knowledge of R and they are base R, not tidyverse). But nonetheless, Practical Statistics for Data Scientists 50 Essential Concepts might be useful for you.

However, I'd recommend the selection of O'Reilly's free data books. if you are looking for any Data (Science) based books.

jystat · July 13, 2018, 8:19pm

I would recommend to look at the books that written by @hadley.

If you want to learn R, I highly recommend that you should begin with "R for Data Science" (http://r4ds.had.co.nz/).

Also, here is another sources that you can check for further.

john.smith · July 14, 2018, 6:42am

Hi,

Just throwing in my two cents if its helpful. You could potentially wrap the stats functions in a wrapper function to return tidy data but i think broom is probably your best bet at the moment. I have seen several blog posts but nothing as in depth as a book. Im not sure it was mentioned already but i have seen a talk at one of the R conferences that uses a package called infer which might be usefu

When I was first having a look at statistics to see if i could use it for work it basically came down to what i wanted to use it for; it mostly depended on the data, However I found the following books very helpful in general.

Outside of the tidyverse statistical foundations are covered really well by Discovering Statistics Using R by Andy Field, Jeremy Miles, Zoe Field. The style is very accessible where each chapter really drives home the conceptual understanding before proceeding to the formulae and assumptions of statistical tests. Its written in a light hearted way and can be read through cover to cover. A similar book which i have not read but is more general and easier is an An Adventure in Statistics: The Reality Enigma. It combines a novel and interweaves the statistical lessons within the novel.

A second book which i cannot recommend highly enough was Statistical Modeling: A Fresh Approach. There are two editions from what i can see but based on the linked one Dr Kaplan spends a lot of setting up why statistical tests are needed. There is a geometry section which draws out why the statistical tests work the way they do. It was the first time i had ever seen anything like this and really gave me a conceptual understanding of the material which up to this point i was just using as a cookbook type thing

Other books I can recommend which don't have anything to do with R would be Statistics in Plain English, Fourth Edition which i didn't read fully but dipped in an out of as a reference and Statistics Done Wrong: The Woefully Complete Guide which I found very witty and chock full of examples of what happens when you ignore assumptions in statistical tests with real world consequences.

If you are on a budget I liked OpenIntro Statistics. The book is free as far as I remember on the open intro website and there is an accompanying coursera course as well (I think). Finally another free book i really liked was Learning Statistics with R

I read these books on and off for a couple of years because i was finding the ISLR and ESLR books a bit tough going having not come from a maths background.

I hope this is useful

Thanks

Leon · July 14, 2018, 8:20am

That's because it doesn't exist (to the best of my knowledge). Statistics A-Z in one book isn't possible.

I recommend taking a look at the following (free) books:

In that case, I would recommend:

Having said that, since you're doing a PhD, I would look into if your university has some good solid courses on the subjects. It is of course possible to learn by yourself, but one should not underestimate the value of being introduced to a subject by someone experienced in the matter and knowledgable about how to teach

Good luck with your endeavour

Timesaver · July 14, 2018, 11:25am

My favorite statistics book is:
https://www.amazon.com/Statistical-Atmospheric-Sciences-International-Geophysics/dp/0123850223/ref=mp_s_a_1_1?ie=UTF8&qid=1531572795&sr=8-1&pi=AC_SX236_SY340_FMwebp_QL65&keywords=statistical+methods+in+atmospheric+sciences&dpPl=1&dpID=61yHXsnRkuL&ref=plSrch
There are no R programs available but you can always google the method for an implementation in R.

Moreover, the following books include R codes along with short descriptions of the statistical methods:

R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics (O'reilly Cookbooks)
Paul Teetor
R in Action: Data Analysis and Graphics with R Second Edition
Robert Kabacoff
The Book of R: A First Course in Programming and Statistics 1st Edition
Tilman M. Davies

bragks · July 15, 2018, 7:46am

Thank you all - this community is awesome!

I've come to realise that what I consider the Z part of statistics is actually pretty simple stuff for some people considering the content in some of the recommended books...

I totally agree with you, @Leon! The obvious choice would be a solid introductory course at my university, however, they're pretty determined to teach statistics using SPSS or STATA, and I'm dead set on learing R.

I think I'll give Statistics with R Specialization at Coursera a go, it looks like a good place to start. One question though: does anyone know if the course is self paced? I'm doing most of this on my spare time after work (with two kids), not sure if I'll be able to keep up with the 5-7 h/week every single week.

On a side note, Practical Statistics for Data Scientists 50 Essential Concepts recommended by @Mark6 is pretty damn close to what I was looking for..

Mark6 · July 15, 2018, 9:47am

Hopefully the book is useful to you!

I feel your pain on:

they're pretty determined to teach statistics using SPSS or STATA

Probably a discussion for elsewhere, but I often wonder why universities seem to have a strong disposition for Stata, SPSS or EViews, but industry and personal projects use R or Python. I used, and was taught, the former in university but have never used them since - but have had to use R and Python in the workforce.

gueyenono · July 17, 2018, 6:50pm

The tidyverse is a set of packages with an underlying paradigm for analyzing data. It isn't inherently a system to do statistics! That is to say that you should not expect tidyverse functions to perform statistical operations for you. That being said, you can definitely use it in conjunction with statistical packages/functions you learn outside it.

I highly recommend you pick ModernDive back up (or any Statistics with R book really) and finish up R for Data Science. Your understanding of the material from both books will allow you to integrate your stats in a tidyverse workflow (especially with functional programming using the {purrr} package). Being able to do that is good practice from what you will have learned.

This advice is given from personal experience. I am currently reading a book on state-space models and am rewriting the book's R code to align with a tidyverse workflow.

I wish you the best!

dan_fahey · July 19, 2018, 3:53pm

I would really recommend Richard McElreath's Statistical Rethinking A Bayesian Course with Examples in R and Stan. He's also put up lectures covering the book on YouTube.

brianstamper · July 20, 2018, 4:19pm

I want to second the recommendation about https://www.r-bloggers.com/, I have learned a lot of things that I didn't even know were something that I wanted to learn just by browsing posts there. (Now if only I could find the time to actually start contributing posts...)

brshallo · July 21, 2018, 10:52pm

I'm excited to see what 'tidy modeling' approaches might be on the horizon with Max Kuhn now at RStudio: https://www.rstudio.com/resources/videos/modeling-in-the-tidyverse/ .

imjtrial · July 31, 2018, 4:22am

How is your progress and any update from your searching?

There is no book about Statistics in Tidyverse so I also raised a new topic asking what's missing and how to connect the rest of R to enhanced the Tidyverse approach in learning Statistics.

raydai · July 31, 2018, 4:34am

All of the replies are awesome! I want to recommend Rstudio cheat sheet, it is handy for new R user. Any modeling, data manipulation, and data cleaning are the most time-consuming work if you are not familiar with these hands-on tools.