I have identified the top 50 genera based on the relative abundance in 16S amplicon sequencing data. Now I want to perform a statistical test to compare the two conditions; salt and non-salt. But not sure how to do a normality check and then how to do statistical tests for these multiple genera in R. One more issue in this data; there are lots of zeros.
If the genera are independent, I think you can do a Wilcoxon test (wilcox.test()) for each one, and then a Bonferroni or Holm correction for multiple testing (p.adjust()). I'm not sure a normality test can give you any meaningful result with n= 3, it would fail to reject the null even for a very non-normal distribution.
Thanks for your response. I got more than 20 replicate for each condition (salt_vs_nonsalt). Here I have just shown a part of my data.
and I got like 100 genera so won't be able to test genera individually? is there any way to make loop to test (wilcox.test) each of these genera and saved output in result file?
OK, with 20 replicates, the normality assumption might hold, but I'm not fully sure how to test it: the best way to decide is usually by qq plots and the like, that's not feasible for 100 genera. You could use a normality test, but, first, that's not great (failure to reject the null that it's normal doesn't necessarily mean that it's normal), and second you would end up doing different tests for each genera, that doesn't sound great. So I would still err on the side of a Wilcoxon, but I might be wrong.
As for looping, yes, it's doable. You can use a for loop, it would look something like: