I started to learn R/RStudio and run into a dead end with nested ANOVA. With an example dataset provided for a textbook (3 columns, 24 lines: soil type clay/sandy, pot 1-8, seedweight 3 times) I run the analysis to get the described outcome.
chap11
# A tibble: 24 × 3
Soil Pot Seedweight
<chr> <chr> <dbl>
1 sandy pot1 6.15
2 sandy pot1 6.87
3 sandy pot1 6.23
4 sandy pot2 5.46
5 sandy pot2 5.9
6 sandy pot2 5.31
7 sandy pot3 6.85
8 sandy pot3 6.99
9 sandy pot3 6.05
10 sandy pot4 5.34
# … with 14 more rows
# ℹ Use `print(n = ...)` to see more rows
summary(aov(Seedweight ~ Soil + Error(Pot), data = chap11)
Error: Pot
Df Sum Sq Mean Sq F value Pr(>F)
Soil 1 11.371 11.371 8.993 0.024 *
Residuals 6 7.587 1.264
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 16 2.826 0.1766
The outcome is identical to the one described in the textbook.
Now I took that as a template for my own data: 6 cell types (Bunky, i. e. soil type), 3 biological replicates (Opakovani, i. e. Pots; HOWEVER, while Pots are labeled as Pot1-Pot8, in my case I used Op1-Op3 for each cell type [cell_type1:rep1-rep3; cell_type2:rep1-rep3 etc.]), 10,000 measurements each replicate (ROS, i e. seedweight); in total, I have 3 columns and 180,000 lines.
Q: Would the repeated labelling (Rep1-Rep2-Rep3; Rep1-Rep2-Rep3; ...) lead to problems further down the text?
With these data I get this result.
vysledek <- aov(ROS ~ Bunky + Error(Opakovani), data = reactive_ox_sp)
> vysledek
Call:
aov(formula = ROS ~ Bunky + Error(Opakovani), data = reactive_ox_sp)
Grand Mean: 11957.78
Stratum 1: Opakovani
Terms:
Residuals
Sum of Squares 144768444519
Deg. of Freedom 2
Residual standard error: 269043.2
Stratum 2: Within
Terms:
Bunky Residuals
Sum of Squares 2.314279e+11 8.642780e+12
Deg. of Freedom 5 179992
Residual standard error: 6929.472
Estimated effects may be unbalanced
> summary(vysledek)
Error: Opakovani
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 2 1.448e+11 7.238e+10
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Bunky 5 2.314e+11 4.629e+10 963.9 <2e-16 ***
Residuals 179992 8.643e+12 4.802e+07
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Q1: Why is there not anything about the factor Bunky on the first line? Why are there only Residuals mentioned and not any F-test and P-value?
Q2: DFs puzzles me. They look like n-1 (cell types [Bunky] 6-1, biological replikates [Opakovani] 3-1)...
Moreover, if I quit RStudio and run it again, from time to time I get results like this:
Error: fopakovani
Df Sum Sq Mean Sq F value Pr(>F)
fbunky 1 2.064e+10 2.064e+10 0.166 0.754
Residuals 1 1.242e+11 1.242e+11
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
fbunky 5 2.315e+11 4.630e+10 964.4 <2e-16 ***
Residuals 179989 8.642e+12 4.801e+07
This time the DF = 1 looks like there is problem with the factor Bunky - like for some reason there are only two groups instead of 6...? I tried to scale down the data taking only 1,000 observations instead of 10,000 and resulted in a new "error" message: 3 out of 5 effects not estimable. Now this would explain DF = 1.
Q3: Could this happen even with 10,000 obsevations but for some reason is not reported?
PS: Later on, I calculated the nested ANOVA with lmer. However, the mystery of aov() not "working" remains...
Edit: I revised the example data...and find a mistake in number of pots. Edited the text accordingly.
Edit2: Found one more difference in data labeling (Pots1 to 8 vs. six times Replicate1 to 3). Edited the text accordingly.