How to analyze a data set with two independent variables and one dependent variable

I would like to establish whether there are differences in phenolic content of five plant cultivars. The samples were collected in two planting seasons and in each season, plant samples were collected in each cultivar six times.

Year Cultivar Phenolics
Y2022 1 65.533
Y2022 1 52.269
Y2022 1 48.616
Y2022 1 55.404
Y2022 1 43.104
Y2022 1 36.694
Y2022 2 60.468
Y2022 2 46.753
Y2022 2 49.881
Y2022 2 40.262
Y2022 2 44.688
Y2022 2 42.292
Y2022 3 39.302
Y2022 3 32.900
Y2022 3 30.273
Y2022 3 36.005
Y2022 3 34.315
Y2022 3 36.894
Y2022 4 55.374
Y2022 4 34.909
Y2022 4 28.268
Y2022 4 40.477
Y2022 4 45.707
Y2022 4 54.754
Y2022 5 37.587
Y2022 5 31.414
Y2022 5 29.257
Y2022 5 34.593
Y2022 5 39.557
Y2022 5 39.343
Y2023 1 68.801
Y2023 1 63.467
Y2023 1 57.869
Y2023 1 75.237
Y2023 1 55.920
Y2023 1 98.612
Y2023 2 41.481
Y2023 2 42.853
Y2023 2 30.908
Y2023 2 36.045
Y2023 2 44.290
Y2023 2 54.380
Y2023 3 43.779
Y2023 3 32.472
Y2023 3 27.789
Y2023 3 36.979
Y2023 3 44.387
Y2023 3 58.902
Y2023 4 32.748
Y2023 4 34.483
Y2023 4 33.973
Y2023 4 40.161
Y2023 4 52.383
Y2023 4 39.919
Y2023 5 34.297
Y2023 5 28.642
Y2023 5 40.669
Y2023 5 57.669
Y2023 5 36.214
Y2023 5 54.025

Above if a sample dataset.
I will appreciate if I can get an r-code or be guided to write it.

Thanks

ANOVA (analysis of variance) is a standard test for this type of problem. You should run the code below and check sources on introductory statistics to understand the result.

d <- data.frame(
  Year = as.factor(c(
    "Y2022", "Y2022", "Y2022",
    "Y2022", "Y2022", "Y2022", "Y2022", "Y2022", "Y2022", "Y2022",
    "Y2022", "Y2022", "Y2022", "Y2022", "Y2022", "Y2022",
    "Y2022", "Y2022", "Y2022", "Y2022", "Y2022", "Y2022",
    "Y2022", "Y2022", "Y2022", "Y2022", "Y2022", "Y2022", "Y2022",
    "Y2022", "Y2023", "Y2023", "Y2023", "Y2023", "Y2023",
    "Y2023", "Y2023", "Y2023", "Y2023", "Y2023", "Y2023",
    "Y2023", "Y2023", "Y2023", "Y2023", "Y2023", "Y2023", "Y2023",
    "Y2023", "Y2023", "Y2023", "Y2023", "Y2023", "Y2023",
    "Y2023", "Y2023", "Y2023", "Y2023", "Y2023", "Y2023"
  )),
  Cultivar = as.factor(c(
    1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
    2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
    4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L,
    2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
    4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L
  )),
  Phenolics = c(
    65.533, 52.269, 48.616, 55.404,
    43.104, 36.694, 60.468, 46.753, 49.881, 40.262, 44.688,
    42.292, 39.302, 32.9, 30.273, 36.005, 34.315, 36.894,
    55.374, 34.909, 28.268, 40.477, 45.707, 54.754, 37.587,
    31.414, 29.257, 34.593, 39.557, 39.343, 68.801, 63.467,
    57.869, 75.237, 55.92, 98.612, 41.481, 42.853, 30.908, 36.045,
    44.29, 54.38, 43.779, 32.472, 27.789, 36.979, 44.387,
    58.902, 32.748, 34.483, 33.973, 40.161, 52.383, 39.919,
    34.297, 28.642, 40.669, 57.669, 36.214, 54.025
  )
)

result <- aov(Phenolics ~ Year*Cultivar, d)
summary(result)
#>               Df Sum Sq Mean Sq F value   Pr(>F)    
#> Year           1    292   292.4   3.194   0.0800 .  
#> Cultivar       4   4020  1004.9  10.977 1.82e-06 ***
#> Year:Cultivar  4   1259   314.8   3.439   0.0147 *  
#> Residuals     50   4577    91.5                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Created on 2023-11-13 with reprex v2.0.2

Thank you for your guidance

1 Like