# Is there a limit for aov() in nested design as how many data it can handle?

I started to learn R/RStudio and run into a dead end with nested ANOVA. With an example dataset provided for a textbook (3 columns, 24 lines: soil type clay/sandy, pot 1-8, seedweight 3 times) I run the analysis to get the described outcome.

``````chap11
# A tibble: 24 × 3
Soil  Pot   Seedweight
<chr> <chr>      <dbl>
1 sandy pot1        6.15
2 sandy pot1        6.87
3 sandy pot1        6.23
4 sandy pot2        5.46
5 sandy pot2        5.9
6 sandy pot2        5.31
7 sandy pot3        6.85
8 sandy pot3        6.99
9 sandy pot3        6.05
10 sandy pot4        5.34
# … with 14 more rows
# ℹ Use `print(n = ...)` to see more rows

summary(aov(Seedweight ~ Soil + Error(Pot), data = chap11)

Error: Pot
Df Sum Sq Mean Sq F value Pr(>F)
Soil       1 11.371  11.371   8.993  0.024 *
Residuals  6  7.587   1.264
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 16  2.826  0.1766
``````

The outcome is identical to the one described in the textbook.

Now I took that as a template for my own data: 6 cell types (Bunky, i. e. soil type), 3 biological replicates (Opakovani, i. e. Pots; HOWEVER, while Pots are labeled as Pot1-Pot8, in my case I used Op1-Op3 for each cell type [cell_type1:rep1-rep3; cell_type2:rep1-rep3 etc.]), 10,000 measurements each replicate (ROS, i e. seedweight); in total, I have 3 columns and 180,000 lines.

Q: Would the repeated labelling (Rep1-Rep2-Rep3; Rep1-Rep2-Rep3; ...) lead to problems further down the text?

With these data I get this result.

``````vysledek <- aov(ROS ~ Bunky + Error(Opakovani), data = reactive_ox_sp)
> vysledek

Call:
aov(formula = ROS ~ Bunky + Error(Opakovani), data = reactive_ox_sp)

Grand Mean: 11957.78

Stratum 1: Opakovani

Terms:
Residuals
Sum of Squares  144768444519
Deg. of Freedom            2

Residual standard error: 269043.2

Stratum 2: Within

Terms:
Bunky    Residuals
Sum of Squares  2.314279e+11 8.642780e+12
Deg. of Freedom            5       179992

Residual standard error: 6929.472
Estimated effects may be unbalanced
> summary(vysledek)

Error: Opakovani
Df    Sum Sq   Mean Sq F value Pr(>F)
Residuals  2 1.448e+11 7.238e+10

Error: Within
Df    Sum Sq   Mean Sq F value Pr(>F)
Bunky          5 2.314e+11 4.629e+10   963.9 <2e-16 ***
Residuals 179992 8.643e+12 4.802e+07
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
``````

Q1: Why is there not anything about the factor Bunky on the first line? Why are there only Residuals mentioned and not any F-test and P-value?

Q2: DFs puzzles me. They look like n-1 (cell types [Bunky] 6-1, biological replikates [Opakovani] 3-1)...

Moreover, if I quit RStudio and run it again, from time to time I get results like this:

``````Error: fopakovani
Df    Sum Sq   Mean Sq F value Pr(>F)
fbunky     1 2.064e+10 2.064e+10   0.166  0.754
Residuals  1 1.242e+11 1.242e+11

Error: Within
Df    Sum Sq   Mean Sq F value Pr(>F)
fbunky         5 2.315e+11 4.630e+10   964.4 <2e-16 ***
Residuals 179989 8.642e+12 4.801e+07
``````

This time the DF = 1 looks like there is problem with the factor Bunky - like for some reason there are only two groups instead of 6...? I tried to scale down the data taking only 1,000 observations instead of 10,000 and resulted in a new "error" message: 3 out of 5 effects not estimable. Now this would explain DF = 1.

Q3: Could this happen even with 10,000 obsevations but for some reason is not reported?

PS: Later on, I calculated the nested ANOVA with lmer. However, the mystery of aov() not "working" remains...

Edit: I revised the example data...and find a mistake in number of pots. Edited the text accordingly.
Edit2: Found one more difference in data labeling (Pots1 to 8 vs. six times Replicate1 to 3). Edited the text accordingly.

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.