New to R - need any help that I can get

rstudio

#1

I am trying to figure out how to run an ANCOVA in R. I am testing if habitat type affects metabolic rate, I have multiple sites too so I need a nested design site[habitat]. Lastly, I need to include mass as a covariate.

Thank you.


#2

Hi Justin,

What have you tried so far? It's always helpful to post the work you've done already (including the literal R code), so we can know how we can help.

Here's an answer on Cross-Validated that might give you some insight. And, frankly, I just found that answer by googling "ancova in R", so I highly recommend you doing the same.


#3

I apologize, I am new to this and have zero coding experience. I have tried this...

dat = read.table("C:/Users/Justin Mukhalian/Desktop/R_folder/ANCOVA.csv",header=TRUE)
dat
results = lm(mass ~ log.WMR. + habitat)
anova(results)

However, I cannot figure out how to nest habitat and site. Also, I am not sure if my syntax is completely correct for the question that I am trying to answer.


#4

@jmukhalian Justin, it's extremely helpful to have some representative data to go with the FAQ: What's a reproducible example (`reprex`) and how do I do one?

Without knowing the data type of habitat and site it's hard even to pose the question of what is meant by nesting

With some clarification, you'll find more people able to pitch in with helpful answers.

Aside: Don't think of R as coding, think of it as algebra. Although there are some programming like features, most of the time you are just passing arguments to functions f(x) = y in R is `y <- someFunction(one or more arguments).


#5

Are you sure that you want them nested? Usually with ANCOVA, you would have an interaction between a qualitative grouping variable and a quantitative covariate. If habitat is the former and log.WMR is the latter, you could use

results <- lm(mass ~ (log.WMR. + habitat)^2, data = dat)

to create a model with the main effects and the two-way interaction (from the ^2 part).

Nesting implies two qualitative variables that have a hierarchical structure (say student within a class).

edit: mixed up variable names


#6

Habitat is a categorical variable, site is also categorical, mass and metabolic rate are both continuous.

Example

Habitat Site Mass Metabolic rate
LLP 1 2.3 0.17
LLP 2 3.1 0.2
LLP 3 3.0 0.23
FSC 1 2.7 0.18
FSC 2 2.3 0.17
FSC 3 2.0 0.19

So Habitat and site need to be nested, and metabolic rate varies with mass. So I need to figure out a way to nest habitat and site and have mass as my covariate for metabolic rate.


#7

OK, that's helpful. I now understand that the question is when constructing a model relating mass to a transformation of metabolic rate plus its habitat is any explanatory power added by considering the site with which the habitat is associated?

In that case I'd think of the base model as being included in, or nested within the explanatory model.

There's a good thread at https://stats.stackexchange.com/questions/4717/what-is-the-difference-between-a-nested-and-a-non-nested-model on the subtle differences among experimental design.

This scenario is actually equivalent to fitting the simple model first and removing its predicted variance from the data, and then fitting the additional component of the more complex model to the residuals from the first fit (at least with least squares estimation). Ruben van Bergen

Your first decision is whether you want to work with the F-Statistic output of lm or the logs ratio of glm. Although both are able to accommodate categorical variables, you don't always get 0 \le P \le 1 when you use lm with categorical variables unless you create dummies for habitat in your case, and you end up working with a maximum likelihood estimator in glm.

Neither approach is right. Assuming you want to use ordinary least squares, you would create two models

without_sites <- lm(mass ~ logWMR + hab1 + hab2) 
mass_ex_res <- without_sites$residuals
with_sites <- lm(mass_ex_res ~ logWMR + hab1 + hab2 + site)

according to the method described by Ruben van Bergen