Hi all,
I've recently been reading Conventions for R Modeling Packages and Develop custom modeling tools so have been thinking about best practices.
Is there a preferred way of supplying additional meta data? (grouping, spatial coordinates, etc.)
As a few examples,
-
most straightforward approach is to supply an extra argument that is validated
to be equal in length to the data, such as some spatial regression models
(e.g.spdep::lagsarlm(formula, data, listw, ...)
) -
lme4
andbrms
supply the varying effects a novel formula style like
y ~ (1 | group) + x1 + x2
to avoid needing two data frames and two formulas
but I don't know how one would implement this pattern in a package. -
an assumption can sometimes be made, for example a mixture of experts uses
two design matrices for clustering (gating) and regression but these can
be assumed to be the same in most cases,
see MEteorits on GitHub "fchamroukhi/MEteorits".
My one hesitation with simply passing an argument is that any row filtering,
say via rsample
with have to be considered.
I'm most likely overthinking this but would be interested in any discussion.