Best practise for supplying metadata in a modelling package

Hi all,

I've recently been reading Conventions for R Modeling Packages and Develop custom modeling tools so have been thinking about best practices.

Is there a preferred way of supplying additional meta data? (grouping, spatial coordinates, etc.)

As a few examples,

  • most straightforward approach is to supply an extra argument that is validated
    to be equal in length to the data, such as some spatial regression models
    (e.g. spdep::lagsarlm(formula, data, listw, ...))

  • lme4 and brms supply the varying effects a novel formula style like
    y ~ (1 | group) + x1 + x2 to avoid needing two data frames and two formulas
    but I don't know how one would implement this pattern in a package.

  • an assumption can sometimes be made, for example a mixture of experts uses
    two design matrices for clustering (gating) and regression but these can
    be assumed to be the same in most cases,
    see MEteorits on GitHub "fchamroukhi/MEteorits".

My one hesitation with simply passing an argument is that any row filtering,
say via rsample with have to be considered.

I'm most likely overthinking this but would be interested in any discussion.