It seems like you're hazy on your intentions.
You want to see if there are statistical differences in the means of quanities tested across various sites for some set of phylum ?
Here is an example where I do that, and I think you could adapt it to cover your case.
I encourage you to run through this line by line (use CTRL and Enter key to send the R instruction which you have under your cursor in the main editor window to be run in the console)
library(tidyverse) # for data manipulation - use install.packages("tidyverse") in the console if you dont yet have this library
library(ggpubr) # for visualisation - install.packages("ggpubr")
set.seed(42) # so when we use random data we can make our analysis repeatable
made_up_df <- tibble(
test_site_codes = letters[1:20],
phylum_1 = rnorm(20,mean=10,sd=2),
phylum_2 = rnorm(20,mean=8 ,sd=4)
)
#have a look at the date in table form
made_up_df
#bad shape for what we want to do so reorganise it
df2 <- pivot_longer(made_up_df,
cols = starts_with("phylum_"),
names_to="phylum",
values_to="sampled_amount")
#see the preferred shape for the data
df2
#lets make phylum a grouping variable, by converting to a factor with levels
df2$phylum <- as.factor(df2$phylum)
#inspect the levels they should be phylum_1 and phylum_2
levels(df2$phylum)
#visualise what we are investigating
ggboxplot(df2, x = "phylum", y = "sampled_amount",
color = "phylum", palette = c("#00AFBB", "#E7B800"),
ylab = "sampled amount", xlab = "phylum")
# Mean plots
# ++++++++++++++++++++
# Plot weight by group
# Add error bars: mean_se
# (other values include: mean_sd, mean_ci, median_iqr, ....)
ggline(df2, x = "phylum", y = "sampled_amount",
add = c("mean_se", "jitter"))
# Compute the analysis of variance
res.aov <- aov(sampled_amount ~ phylum, data = df2)
# Summary of the analysis
summary(res.aov)