Finding Full- and Reduced-Model Residuals

I'm trying to illustrate how a simple ANOVA works.

I have mercury levels (Mercury) for three types of lakes (Lake_Type): Eutropic, Mesotropic, and Oligotropic. The ANOVA compares residuals from a full model and from a reduced model. The reduced model assumes the mean mercury level is the same for all lakes; The full model assumes there are different means for each lake type. I need to find the residuals for each type.

A column containing reduced model residuals is not hard to calculate:

YY <- mean(dt$Mercury)
dt <- cbind (dt, "Reduced" = dt$Mercury - YY)

I had a little trouble figuring out how to calculate the means for each column. I can do

YG <- with(dt, tapply(Mercury, Lake_Type, mean))

to get a list:

   Eutropic  Mesotropic Oligotropic 
  0.5527551   0.4523488   0.3826316 

which lets me access the means as YG["Eutropic"] if I want.

Alternatively, I can aggregate() over the list of names

YG = aggregate(dt$Mercury, list(LT = dt$Lake_Type), mean)

which gives me a data frame

           LT         x
1    Eutropic 0.5527551
2  Mesotropic 0.4523488
3 Oligotropic 0.3826316

You may have already guessed the punch line here. I want to compute the residuals for the full model; that is, for each row in the data table find the difference between dt$Mercury and one of the three means, based on the column dt$Lake_Type. Said another way, for the rows in dt where Lake_Type=="Eutropic", I want to subtract the Eutropic Mean found using the with() or aggregate() methods above.

I'm just blocked and can't seem to figure out how to construct this column. My only restriction is that I can't use pipes, but this shouldn't be driving me this crazy. Anyone want to suggest something?

Wait: Can I use

dt$Mercury - YG[dt$Lake_Type]

where YG is the with() version? I seem to get both lake type and the appropriate mean for this, but it may be what I'm looking for.

 Mesotropic     Eutropic     Eutropic     Eutropic     Eutropic     Eutropic 
 0.627651163 -0.527755102  0.017244898  0.217244898  0.237244898  0.197244898 
  Mesotropic     Eutropic     Eutropic     Eutropic  Oligotropic     Eutropic 
-0.182348837 -0.372755102  0.497244898 -0.242755102  0.427368421  0.027244898 

Edit: That's indeed what I wanted. Sorry for taking time on the Community. Can I delete this?

I would merge the YG you get from aggregate() with dt like this.

dt2 <- merge(dt, YG, by.x = "Lake_Type", by.y = "LT")

or use the inner_join from dplyr

dt2 <- inner_join(dt, YG, by = c("Lake_Type" = "LT"))

I hope I didn't make any mistakes working without example data.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.