Stats and Data Science methods


Relatively new Rstudio user, but passionate about learning R!

I'm just getting to the R4DS chapter on modelr and the concept of nested data, with a list column of individually fitted models. I'm wrapping my head around that concept/framework and had a couple questions I thought I'd throw out here:

  1. How are individually fitted models by nested group different than an lmer random effects model?
  2. Given those differences, what are the pros/cons of one versus the other?

Excited to learn more and progress in my stats and r skills and engage with this new community.


PS Wasn't sure how to start this topic/question, completely happy to have renamed/recategorized.

There are two big differences with a single random-effects model versus multiple models that I can think of:

  1. The variance will be assumed to be the same across all groups in a single model, unless explicitly modeled separately (this is also true if you use a single fixed-effects model). If you expect to have wildly different variances across your groups (and not in a way that could be eliminated through transformation), that's a pretty good sign that going with independent models is a reasonable path.

  2. In a random effects model, partial pooling will impact the estimates of the coefficients. This is explained very well by Tristan Mahr using the tidyverse at this blog post.

As a very general rule, for a given data set, a larger single model is more appropriate than several smaller models, since information from one model can influence the other. There are certainly times where this doesn't make sense, though, such as using a collection of models from different cross-validation slices, or if you are trying to compare the results from several potential models.


Hi @nick,

Thanks for a great breakdown, this is exactly the kind of explanation I was looking for.
Thanks also for the blog post link, that is really fantastic, definitely going to dig deeper into that.