Mixed-effect models and ANOVA in the Tidyverse

Am I in the wrong stats universe?

I work in agriculture and our bread and butter is designed experiments intended to be analyzed with ANOVA or as mixed-effect models. The most common packages I use for analysis are agricolae and nlme. Sometimes I can just use base stats (lm), but it's often not sufficient.

I use a tidy workflow, but haven't found a great way to mix anything beyond lm into my code. I find ways to do it, but not great ways. In the end, I seldom have a nice table I can share with a non-R colleague. There are also a couple fringe packages out there that I am excited about (broom.mixed, nls.multstart, but don't have a ton of support at this time.

When what I'm doing is not 1) incorporated in the Tidyverse, 2) contained in any other well-supported packages, 3) found as popular (or even answered) questions on StackOverflow, I start to feel like maybe it's because everyone else is taking a different approach? Or did the ag stats/ANOVA world just get left behind?

I've worked quite a bit with predictive models, but have recently needed to return to a lot of split-plots, strip-plots, split-split plots etc. and I'm wondering if I should just really dig into mixed-effect models or if there is something out there that I am missing?

7 Likes

Hi,

Do you have a reproducible example, called a reprex of a simpler example of what you are now doing that you'd like to see tidied? It's essential that it include representative data, either your own or one of the built in data() set.

No. I'm just looking for discussion from people who may have had similar experiences or have pondered the same question.

I've been thinking about it quite a bit.

parsnip takes the approach of specifying the type of model then the engine that is used to fit it. The latter is usually the name of some R package. The type is defined by the structure of the model. For example, linear_reg() means that we have simple slopes and intercepts that predict some numeric outcome.

For mixed effects models, we'll probably have some generic model type that is a simple pass-though for lmer or lme.

However, there is some specific cases that we could also make. For example, suppose that you have a simple repeated measures experiment where the independent experimental units are measured multiple times (without any specific time ordering). This would define the structural part and possible engines could be used for different estimation methods. For example:

  • correlated error models via nlme::gls().

  • correlated error models via gee:gee().

  • random intercept models via nlme::lme() or lme4::lmer()

  • Bayesian estimation via rstanarm::stan_glm().

and some things that I haven't thought of.

For the types of experiments that (I think) you would deal with could be defined in similar ways. If you could define different model formats for split/strip-plot designs, a set of parsnip methods would be. fairly easy. These could also have relavant tidy() methods to extract the parts that you use a lot (I always had to go to the ape package to get variance components out of lme objects).

4 Likes

Thanks, Max!

This is something I will continue to think a lot about while going back and reviewing some of the basics of mixed-effects models. It's nice to know that these models are not forgotten.

I will eventually try to come up with some common model formats. agricolae already does a good job of including some of these, like a split-split plot or strip plot.

There is not always a lot of agreement on the models we set up and they tend to get twisted as we run through an analysis. The nature of having to plant, grow, and harvest an entire experiment leads to a lot of stacking/nesting of treatments (row space x tillage x varieties x location x year x rep) and unmeasured environmental variability doesn't help matters. We end up with a lot of custom models and usually a person or two disagrees with which effects are random and what might be nested. These are the kinds of things I am hoping to set some hard rules for in our group. Then we can get some reusable code here.

Either way, the definitive book for these methods is now 20 years old and it would be great if we as a community would decide to bring these methods into contemporary times or declare that we are done with them and moving on to something else.

1 Like

I'm so old I thought that you meant this one. :grinning:

We should loop in @emitanaka, who does a lot of this and is working on a platform-independent DSL for specifying mixed models.

2 Likes

I'd like to offer a slightly different perspective. tidymodels essentially solves two modeling problems: providing a consistent user interface to common machine learning estimators, and providing a consistent predict() method. These are rarely the modeling problems faced by researchers using mixed effects models.

If you want to use tibbles to manipulate estimates from mixed effect models, broom.mixed is definitely the way to go. broom.mixed will only help you manipulate estimates though, it won't help with model specification, checking, or inference.

IMO there are two major developments in mixed models for R at the moment. The first is the Stan ecosystem, where the Stan group is taking a Bayesian approach to mixed effects models. The brms and rstanarm vignettes are well written and present a good entrypoint to this universe. Keep an eye out for the forthcoming book Advanced Regression and Multilevel Models by Gelman et al, which will feature a number of cool recent methodological developments. The Stan ecosystem is not strictly tidy, but it is by and large well written software that is easy to use. brms is currently my go-to for mixed effects modeling due to the immense variety of tools for working with fit models after estimation (note that this is something that tidymodels hasn't focused on so far). The Stan universe also has a number of tools like tidybayes that are well worth investigating.

In terms of frequentist mixed effect modeling, there is much less active development, especially since Doug Bates (who wrote lme4) started working primarily in Julia. Nonetheless, the second set of exciting developments is recent work on mixed modeling by Emi Tanaka. You may enjoy this recent paper from her describing how to specify various mixed effects models in lme4 and asreml. Additionally, Dr. Tanaka is working on tidy experimental design, although my understanding is that this work is largely at the brainstorming stage. If you are looking for a slightly more recent reference with a more applied focus, I personally am a fan of Oehlert.

6 Likes

I shouldn't have used the word "definitive", but that S and S+ book is still where I get all my code! And it always feels like I should be doing something else, not the mention the sinking feeling I get when a new R user shows me their non-linear data and I close my eyes and slide Pinheiro and Bates over to them. . .

But the book you posted is exactly the kind of basics I need to get back to over the next few months. Those reviews though - " it looked good until I touched it. it was so old and dried out that the cover fell off in the first 5 minutes we had it." :joy:

2 Likes

My copy is buried in the basement. If you will be at RStudio conf I can lend it to you. I won't be needing it immediately.

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.