statistical test for 16S amplicon sequencing data

Hi All,

I have identified the top 50 genera based on the relative abundance in 16S amplicon sequencing data. Now I want to perform a statistical test to compare the two conditions; salt and non-salt. But not sure how to do a normality check and then how to do statistical tests for these multiple genera in R. One more issue in this data; there are lots of zeros.

data = read.table("statisticalTest.salt_vs_non_salt.txt",header=T,sep='\t',check.names=F)
head(data)

                          salt        salt        salt    non_salt    non_salt
1     Burkholderia 0.097731239 0.244541624 0.001426243 0.116990216 0.013678457
2   Bradyrhizobium 0.193819936 0.135586384 0.094290501 0.048097711 0.020956642
3 Anaeromyxobacter 0.103890771 0.087270334 0.139477497 0.117391401 0.219336751
4        Geobacter 0.008452247 0.018362646 0.016118808 0.045958054 0.051796890
5   Microbacterium 0.000513294 0.003237345 0.019378792 0.000156017 0.000538076
6    Methylocystis 0.008794443 0.010794689 0.004957892 0.001760760 0.002265583
    non_salt
1 0.27859036 
2 0.09660323 
3 0.20067454 
4 0.03892350 
5 0.00000000 
6 0.01459201


Many thanks

If the genera are independent, I think you can do a Wilcoxon test (wilcox.test()) for each one, and then a Bonferroni or Holm correction for multiple testing (p.adjust()). I'm not sure a normality test can give you any meaningful result with n= 3, it would fail to reject the null even for a very non-normal distribution.

Dear @AlexisW,

Thanks for your response. I got more than 20 replicate for each condition (salt_vs_nonsalt). Here I have just shown a part of my data.

and I got like 100 genera so won't be able to test genera individually? is there any way to make loop to test (wilcox.test) each of these genera and saved output in result file?

many thanks,

OK, with 20 replicates, the normality assumption might hold, but I'm not fully sure how to test it: the best way to decide is usually by qq plots and the like, that's not feasible for 100 genera. You could use a normality test, but, first, that's not great (failure to reject the null that it's normal doesn't necessarily mean that it's normal), and second you would end up doing different tests for each genera, that doesn't sound great. So I would still err on the side of a Wilcoxon, but I might be wrong.

As for looping, yes, it's doable. You can use a for loop, it would look something like:

p_values <- numeric(length = nrow(input_df))

for(i in seq_len(nrow(input_df)){
    p_values[[i]] <- wilcox.test(input_df[i, 1:20], input_df[i, 21:40])$p.val
}

p_adj <- p.adjust(p_values)

(not tested, needs to be adapted to your data, this is just to give a general structure).

Dear @AlexisW,
Thank you so much for the R loop but it is showing some errors;
I am running below script;


input_df = read.table("16S.soil.Phyla.18.11.21.txt",header=T,sep='\t',check.names=F)
head(input_df)

p_values <- numeric(length = nrow(input_df))

for(i in seq_len(nrow(input_df)){p_values[[i]] <- wilcox.test(input_df[i, 1:27], input_df[i, 28:54])$p.val}

p_adj <- p.adjust(p_values)


write.table(p_adj,"16S.phylum.wilcox.txt",sep="\t")

Error;

Error: unexpected '{' in "for(i in seq_len(nrow(input_df)){"
Execution halted





Thanks again

Check the parentheses, there are three ( but only two )