I'm trying to illustrate how a simple ANOVA works.
I have mercury levels (Mercury
) for three types of lakes (Lake_Type
): Eutropic, Mesotropic, and Oligotropic. The ANOVA compares residuals from a full model and from a reduced model. The reduced model assumes the mean mercury level is the same for all lakes; The full model assumes there are different means for each lake type. I need to find the residuals for each type.
A column containing reduced model residuals is not hard to calculate:
YY <- mean(dt$Mercury)
dt <- cbind (dt, "Reduced" = dt$Mercury - YY)
I had a little trouble figuring out how to calculate the means for each column. I can do
YG <- with(dt, tapply(Mercury, Lake_Type, mean))
to get a list:
Eutropic Mesotropic Oligotropic
0.5527551 0.4523488 0.3826316
which lets me access the means as YG["Eutropic"]
if I want.
Alternatively, I can aggregate()
over the list of names
YG = aggregate(dt$Mercury, list(LT = dt$Lake_Type), mean)
which gives me a data frame
LT x
1 Eutropic 0.5527551
2 Mesotropic 0.4523488
3 Oligotropic 0.3826316
You may have already guessed the punch line here. I want to compute the residuals for the full model; that is, for each row in the data table find the difference between dt$Mercury
and one of the three means, based on the column dt$Lake_Type
. Said another way, for the rows in dt
where Lake_Type=="Eutropic"
, I want to subtract the Eutropic Mean found using the with()
or aggregate()
methods above.
I'm just blocked and can't seem to figure out how to construct this column. My only restriction is that I can't use pipes, but this shouldn't be driving me this crazy. Anyone want to suggest something?