What modeling books should I read next?

I'm looking for recommendations of books to read to increase my practical statistical analysis/statistical modeling abilities. A little background on me:

  • 5 years as an actuary-in-training in health insurance with some computer science background
  • Passed the actuarial predictive analytics exam, and I'm at an ISLR-level of theoretical understanding of statistical methods. No formal stats training during my math undergrad degree, I was mostly doing graph theory work.
  • Have read R4DS, Advanced R, and I'm working on Text Mining with R: A Tidy Approach, and my programming and data munging skills are solid

The short list for my next book currently includes:

  • Feature Selection and Engineering
  • Applied Predictive Modeling
  • Elements of Statistical Learning
  • Tidymodels, but I would strongly prefer something I can get in print (Side question, anyone know if there will be a hard copy of this one day?)

And I'm curious to hear from those with more experience who have read one or more of these books: Which one should I read next? Is there a book I'm missing that would be better? Any thoughts or recommendations would be greatly appreciated!

2 Likes

We are working with O'Reilly and should have print copies of TMwR out by the end of July.

I'll stop there since I'm biased towards 3/4 of your list :smile:

1 Like

Applied Predictive Modeling is a great book and it's what got me into machine learning.

Elements of Statistical Learning is more theoretical. A lot of wisdom, but in my opinion, it's not really that necessary to know the finer details of how regularized regression converges or the differences between various tree regression approaches. I didn't take as much from it.

This is a boring recommendation, but the single book I have learned the most from is Applied Linear Statistical Models by Kutner et al. Honestly, data science is an incomplete subject on its own. I don't think the data science literature is going to do a good job of teaching the assumptions made by treating each row of data as an independent sample, establishing causality, least squares and maximum likelihood, autocorrelated data, multicollinearity, ... I've met a lot of data scientists without this stats foundation who get a data set and just start hacking.

1 Like

Ooooh, how exciting! For some reason I was thinking it would be much longer. I also just bought Mastering Shiny, so maybe I'll just work on that and get TMwR as soon as it comes out.

@arthur.t That's great to know about, thank you! I have some of that stats foundation from my actuarial training, but I would not say it's super strong, and it's definitely something I want to get better at.

I know my math textbooks spanned a broad range of approachability, with some of them even to the point of being enjoyable to read through. Also, that's sort of what I've gotten spoiled to with all the books from RStudio people. Is Applied Linear Statistical Models written to be read, or is it more of a traditional textbook that needs a class/instructor to go along with it?

It's a textbook (I assume for first year grad students), but I find it readable and well-written. I read it on my own about 10 years ago and refer back to it every once in a while. I was already familiar with linear regression and hypothesis tests before I read it, but it was a extremely enlightening to get exposed to regression diagnostics, logistic regression, mixed models, time series, methods for autocorrelated data, all in one resource.

You can probably skip the chapters on Design of Experiments unless you work in the sciences or engineering.

1 Like

+1 to this. That was the book assigned to my graduate school linear models class.

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.