statistical test for 16S amplicon sequencing data

bioinfonext · November 11, 2021, 12:31pm

Hi All,

I have identified the top 50 genera based on the relative abundance in 16S amplicon sequencing data. Now I want to perform a statistical test to compare the two conditions; salt and non-salt. But not sure how to do a normality check and then how to do statistical tests for these multiple genera in R. One more issue in this data; there are lots of zeros.

data = read.table("statisticalTest.salt_vs_non_salt.txt",header=T,sep='\t',check.names=F)
head(data)

                          salt        salt        salt    non_salt    non_salt
1     Burkholderia 0.097731239 0.244541624 0.001426243 0.116990216 0.013678457
2   Bradyrhizobium 0.193819936 0.135586384 0.094290501 0.048097711 0.020956642
3 Anaeromyxobacter 0.103890771 0.087270334 0.139477497 0.117391401 0.219336751
4        Geobacter 0.008452247 0.018362646 0.016118808 0.045958054 0.051796890
5   Microbacterium 0.000513294 0.003237345 0.019378792 0.000156017 0.000538076
6    Methylocystis 0.008794443 0.010794689 0.004957892 0.001760760 0.002265583
    non_salt
1 0.27859036 
2 0.09660323 
3 0.20067454 
4 0.03892350 
5 0.00000000 
6 0.01459201


Many thanks

AlexisW · November 13, 2021, 6:19pm

If the genera are independent, I think you can do a Wilcoxon test (wilcox.test()) for each one, and then a Bonferroni or Holm correction for multiple testing (p.adjust()). I'm not sure a normality test can give you any meaningful result with n= 3, it would fail to reject the null even for a very non-normal distribution.

bioinfonext · November 15, 2021, 5:27pm

Dear @AlexisW,

Thanks for your response. I got more than 20 replicate for each condition (salt_vs_nonsalt). Here I have just shown a part of my data.

and I got like 100 genera so won't be able to test genera individually? is there any way to make loop to test (wilcox.test) each of these genera and saved output in result file?

many thanks,

AlexisW · November 15, 2021, 5:47pm

OK, with 20 replicates, the normality assumption might hold, but I'm not fully sure how to test it: the best way to decide is usually by qq plots and the like, that's not feasible for 100 genera. You could use a normality test, but, first, that's not great (failure to reject the null that it's normal doesn't necessarily mean that it's normal), and second you would end up doing different tests for each genera, that doesn't sound great. So I would still err on the side of a Wilcoxon, but I might be wrong.

As for looping, yes, it's doable. You can use a for loop, it would look something like:

p_values <- numeric(length = nrow(input_df))

for(i in seq_len(nrow(input_df)){
    p_values[[i]] <- wilcox.test(input_df[i, 1:20], input_df[i, 21:40])$p.val
}

p_adj <- p.adjust(p_values)

(not tested, needs to be adapted to your data, this is just to give a general structure).

bioinfonext · November 18, 2021, 5:59pm

Dear @AlexisW,
Thank you so much for the R loop but it is showing some errors;
I am running below script;


input_df = read.table("16S.soil.Phyla.18.11.21.txt",header=T,sep='\t',check.names=F)
head(input_df)

p_values <- numeric(length = nrow(input_df))

for(i in seq_len(nrow(input_df)){p_values[[i]] <- wilcox.test(input_df[i, 1:27], input_df[i, 28:54])$p.val}

p_adj <- p.adjust(p_values)


write.table(p_adj,"16S.phylum.wilcox.txt",sep="\t")

Error;

Error: unexpected '{' in "for(i in seq_len(nrow(input_df)){"
Execution halted





Thanks again

AlexisW · November 18, 2021, 6:49pm

Check the parentheses, there are three ( but only two )

system · December 9, 2021, 6:50pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.