I started to learn R/RStudio and run into a dead end with nested ANOVA. With an example dataset provided for a textbook (3 columns, 24 lines: soil type clay/sandy, pot 1-8, seedweight 3 times) I run the analysis to get the described outcome.
chap11 # A tibble: 24 × 3 Soil Pot Seedweight <chr> <chr> <dbl> 1 sandy pot1 6.15 2 sandy pot1 6.87 3 sandy pot1 6.23 4 sandy pot2 5.46 5 sandy pot2 5.9 6 sandy pot2 5.31 7 sandy pot3 6.85 8 sandy pot3 6.99 9 sandy pot3 6.05 10 sandy pot4 5.34 # … with 14 more rows # ℹ Use `print(n = ...)` to see more rows summary(aov(Seedweight ~ Soil + Error(Pot), data = chap11) Error: Pot Df Sum Sq Mean Sq F value Pr(>F) Soil 1 11.371 11.371 8.993 0.024 * Residuals 6 7.587 1.264 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Error: Within Df Sum Sq Mean Sq F value Pr(>F) Residuals 16 2.826 0.1766
The outcome is identical to the one described in the textbook.
Now I took that as a template for my own data: 6 cell types (Bunky, i. e. soil type), 3 biological replicates (Opakovani, i. e. Pots; HOWEVER, while Pots are labeled as Pot1-Pot8, in my case I used Op1-Op3 for each cell type [cell_type1:rep1-rep3; cell_type2:rep1-rep3 etc.]), 10,000 measurements each replicate (ROS, i e. seedweight); in total, I have 3 columns and 180,000 lines.
Q: Would the repeated labelling (Rep1-Rep2-Rep3; Rep1-Rep2-Rep3; ...) lead to problems further down the text?
With these data I get this result.
vysledek <- aov(ROS ~ Bunky + Error(Opakovani), data = reactive_ox_sp) > vysledek Call: aov(formula = ROS ~ Bunky + Error(Opakovani), data = reactive_ox_sp) Grand Mean: 11957.78 Stratum 1: Opakovani Terms: Residuals Sum of Squares 144768444519 Deg. of Freedom 2 Residual standard error: 269043.2 Stratum 2: Within Terms: Bunky Residuals Sum of Squares 2.314279e+11 8.642780e+12 Deg. of Freedom 5 179992 Residual standard error: 6929.472 Estimated effects may be unbalanced > summary(vysledek) Error: Opakovani Df Sum Sq Mean Sq F value Pr(>F) Residuals 2 1.448e+11 7.238e+10 Error: Within Df Sum Sq Mean Sq F value Pr(>F) Bunky 5 2.314e+11 4.629e+10 963.9 <2e-16 *** Residuals 179992 8.643e+12 4.802e+07 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Q1: Why is there not anything about the factor Bunky on the first line? Why are there only Residuals mentioned and not any F-test and P-value?
Q2: DFs puzzles me. They look like n-1 (cell types [Bunky] 6-1, biological replikates [Opakovani] 3-1)...
Moreover, if I quit RStudio and run it again, from time to time I get results like this:
Error: fopakovani Df Sum Sq Mean Sq F value Pr(>F) fbunky 1 2.064e+10 2.064e+10 0.166 0.754 Residuals 1 1.242e+11 1.242e+11 Error: Within Df Sum Sq Mean Sq F value Pr(>F) fbunky 5 2.315e+11 4.630e+10 964.4 <2e-16 *** Residuals 179989 8.642e+12 4.801e+07
This time the DF = 1 looks like there is problem with the factor Bunky - like for some reason there are only two groups instead of 6...? I tried to scale down the data taking only 1,000 observations instead of 10,000 and resulted in a new "error" message: 3 out of 5 effects not estimable. Now this would explain DF = 1.
Q3: Could this happen even with 10,000 obsevations but for some reason is not reported?
PS: Later on, I calculated the nested ANOVA with lmer. However, the mystery of aov() not "working" remains...
Edit: I revised the example data...and find a mistake in number of pots. Edited the text accordingly.
Edit2: Found one more difference in data labeling (Pots1 to 8 vs. six times Replicate1 to 3). Edited the text accordingly.