I am trying to figure out how to run an ANCOVA in R. I am testing if habitat type affects metabolic rate, I have multiple sites too so I need a nested design site[habitat]. Lastly, I need to include mass as a covariate.

Thank you.

I am trying to figure out how to run an ANCOVA in R. I am testing if habitat type affects metabolic rate, I have multiple sites too so I need a nested design site[habitat]. Lastly, I need to include mass as a covariate.

Thank you.

Hi Justin,

What have you tried so far? It's always helpful to post the work you've done already (including the literal R code), so we can know how we can help.

Here's an answer on Cross-Validated that might give you some insight. And, frankly, I just found that answer by googling "ancova in R", so I highly recommend you doing the same.

1 Like

I apologize, I am new to this and have zero coding experience. I have tried this...

dat = read.table("C:/Users/Justin Mukhalian/Desktop/R_folder/ANCOVA.csv",header=TRUE)

dat

results = lm(mass ~ log.WMR. + habitat)

anova(results)

However, I cannot figure out how to nest habitat and site. Also, I am not sure if my syntax is completely correct for the question that I am trying to answer.

@jmukhalian Justin, it's extremely helpful to have some representative data to go with the FAQ: What's a reproducible example (`reprex`) and how do I do one?

Without knowing the data type of `habitat`

and `site`

it's hard even to pose the question of what is meant by `nesting`

With some clarification, you'll find more people able to pitch in with helpful answers.

Aside: Don't think of `R`

as *coding*, think of it as **algebra**. Although there are some programming like features, most of the time you are just passing arguments to functions f(x) = y in `R`

is `y <- someFunction(one or more arguments).

Are you sure that you want them nested? Usually with ANCOVA, you would have an interaction between a qualitative grouping variable and a quantitative covariate. **If** `habitat`

is the former and `log.WMR`

is the latter, you could use

```
results <- lm(mass ~ (log.WMR. + habitat)^2, data = dat)
```

to create a model with the main effects and the two-way interaction (from the `^2`

part).

Nesting implies two qualitative variables that have a hierarchical structure (say student within a class).

edit: mixed up variable names

1 Like

Habitat is a categorical variable, site is also categorical, mass and metabolic rate are both continuous.

Example

Habitat Site Mass Metabolic rate

LLP 1 2.3 0.17

LLP 2 3.1 0.2

LLP 3 3.0 0.23

FSC 1 2.7 0.18

FSC 2 2.3 0.17

FSC 3 2.0 0.19

So Habitat and site need to be nested, and metabolic rate varies with mass. So I need to figure out a way to nest habitat and site and have mass as my covariate for metabolic rate.

1 Like

OK, that's helpful. I now understand that the question is *when constructing a model relating mass to a transformation of metabolic rate plus its habitat is any explanatory power added by considering the site with which the habitat is associated?*

In that case I'd think of the base model as being included in, or *nested* within the explanatory model.

There's a good thread at https://stats.stackexchange.com/questions/4717/what-is-the-difference-between-a-nested-and-a-non-nested-model on the subtle differences among experimental design.

This scenario is actually equivalent to fitting the simple model first and removing its predicted variance from the data, and then fitting the additional component of the more complex model to the residuals from the first fit (at least with least squares estimation). Ruben van Bergen

Your first decision is whether you want to work with the `F-Statistic`

output of `lm`

or the logs ratio of `glm`

. Although both are able to accommodate categorical variables, you don't always get 0 \le P \le 1 when you use `lm`

with categorical variables unless you create dummies for habitat in your case, and you end up working with a `maximum likelihood estimator`

in `glm`

.

Neither approach is *right*. Assuming you want to use ordinary least squares, you would create two models

```
without_sites <- lm(mass ~ logWMR + hab1 + hab2)
mass_ex_res <- without_sites$residuals
with_sites <- lm(mass_ex_res ~ logWMR + hab1 + hab2 + site)
```

according to the method described by Ruben van Bergen

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.