Hi
I'm new to R (and statistics), and I love reading books when I learn new stuff. I've read Hands on Programming with R and I'm halfway through R for Data Science. I think they're both great and would absolutely recommended them to someone beginning with R, but the main reason I'm learning R at the moment (except for needing an excuse to learn programming) is statistics for my phd. Long-term I'd like to use it for machine learning in medical imaging.
What I've learned this far has already been tremendously helpful, however, I can't seem to find a book that actually covers statistics from A-Z using tidyverse-based methods. I have a feeling ModernDive at some point will evolve into what I'm looking for, but I need it NOW.
In other words, I'm looking for a book that explains statistics and "how-to-do-R-things" with everything from basic stuff like measures of centre, distributions etc to advanced regression methods using a tidyverse-based approach. Is there such a book? If not, what are the second best options?
I don't know if what you're hoping for exists either. Part of what's complicated is that there really isn't much agreement on what "covers statistics from A-Z" means between different stats-using disciplines (or between statistics-the-discipline and applied statistics as a whole). Whether machine learning even belongs in the same category as "statistics" has been a matter of debate (arbitrary example, from the top of my google: Machine Learning vs Statistics - KDnuggets).
That said, we have this thread about people in this community's favorite "pure statistics" books, which isn't a direct answer but there's some pretty great stuff in there:
Generally, statistics textbooks are not exactly page turners and the good ones tend to be more specialised as opposed to introductory. Instead I would recommend doing one or two of the introductory MOOCs on sites like Coursera (e.g. https://www.coursera.org/specializations/statistics). Then read blogs online of people using the techniques on interesting data (check out R-bloggers).
When you feel you have a grasp on what the statistical concept you are studying does you could then try implementing it in the tidyverse idiom. For this check out the broom package which takes the output of models and puts them into tidy data frames and the twidlr package which restructures several common model functions to work with pipes.
Lastly since statistics can seem quite abstract. Plot what you are doing as much as possible to develop an intuition about what is occuring.
It also contains R code snippets to give programming examples of the concepts (though I don't think they are the best examples, they aren't an introduction to R so you need some knowledge of R and they are base R, not tidyverse). But nonetheless, Practical Statistics for Data Scientists 50 Essential Concepts might be useful for you.
Just throwing in my two cents if its helpful. You could potentially wrap the stats functions in a wrapper function to return tidy data but i think broom is probably your best bet at the moment. I have seen several blog posts but nothing as in depth as a book. Im not sure it was mentioned already but i have seen a talk at one of the R conferences that uses a package called infer which might be usefu
When I was first having a look at statistics to see if i could use it for work it basically came down to what i wanted to use it for; it mostly depended on the data, However I found the following books very helpful in general.
Outside of the tidyverse statistical foundations are covered really well by Discovering Statistics Using R by Andy Field, Jeremy Miles, Zoe Field. The style is very accessible where each chapter really drives home the conceptual understanding before proceeding to the formulae and assumptions of statistical tests. Its written in a light hearted way and can be read through cover to cover. A similar book which i have not read but is more general and easier is an An Adventure in Statistics: The Reality Enigma. It combines a novel and interweaves the statistical lessons within the novel.
A second book which i cannot recommend highly enough was Statistical Modeling: A Fresh Approach. There are two editions from what i can see but based on the linked one Dr Kaplan spends a lot of setting up why statistical tests are needed. There is a geometry section which draws out why the statistical tests work the way they do. It was the first time i had ever seen anything like this and really gave me a conceptual understanding of the material which up to this point i was just using as a cookbook type thing
Other books I can recommend which don't have anything to do with R would be Statistics in Plain English, Fourth Edition which i didn't read fully but dipped in an out of as a reference and Statistics Done Wrong: The Woefully Complete Guide which I found very witty and chock full of examples of what happens when you ignore assumptions in statistical tests with real world consequences.
If you are on a budget I liked OpenIntro Statistics. The book is free as far as I remember on the open intro website and there is an accompanying coursera course as well (I think). Finally another free book i really liked was Learning Statistics with R
I read these books on and off for a couple of years because i was finding the ISLR and ESLR books a bit tough going having not come from a maths background.
Having said that, since you're doing a PhD, I would look into if your university has some good solid courses on the subjects. It is of course possible to learn by yourself, but one should not underestimate the value of being introduced to a subject by someone experienced in the matter and knowledgable about how to teach
I've come to realise that what I consider the Z part of statistics is actually pretty simple stuff for some people considering the content in some of the recommended books...
I totally agree with you, @Leon! The obvious choice would be a solid introductory course at my university, however, they're pretty determined to teach statistics using SPSS or STATA, and I'm dead set on learing R.
I think I'll give Statistics with R Specialization at Coursera a go, it looks like a good place to start. One question though: does anyone know if the course is self paced? I'm doing most of this on my spare time after work (with two kids), not sure if I'll be able to keep up with the 5-7 h/week every single week.
they're pretty determined to teach statistics using SPSS or STATA
Probably a discussion for elsewhere, but I often wonder why universities seem to have a strong disposition for Stata, SPSS or EViews, but industry and personal projects use R or Python. I used, and was taught, the former in university but have never used them since - but have had to use R and Python in the workforce.
The tidyverse is a set of packages with an underlying paradigm for analyzing data. It isn't inherently a system to do statistics! That is to say that you should not expect tidyverse functions to perform statistical operations for you. That being said, you can definitely use it in conjunction with statistical packages/functions you learn outside it.
I highly recommend you pick ModernDive back up (or any Statistics with R book really) and finish up R for Data Science. Your understanding of the material from both books will allow you to integrate your stats in a tidyverse workflow (especially with functional programming using the {purrr} package). Being able to do that is good practice from what you will have learned.
This advice is given from personal experience. I am currently reading a book on state-space models and am rewriting the book's R code to align with a tidyverse workflow.
I want to second the recommendation about https://www.r-bloggers.com/, I have learned a lot of things that I didn't even know were something that I wanted to learn just by browsing posts there. (Now if only I could find the time to actually start contributing posts...)
How is your progress and any update from your searching?
There is no book about Statistics in Tidyverse so I also raised a new topic asking what's missing and how to connect the rest of R to enhanced the Tidyverse approach in learning Statistics.
All of the replies are awesome! I want to recommend Rstudio cheat sheet, it is handy for new R user. Any modeling, data manipulation, and data cleaning are the most time-consuming work if you are not familiar with these hands-on tools.