anova analysis and conditions for microbiome analysis

Hi, everyone,
I have microbioal data (S4 format created by phyloseq package), which includes an OTU table, metadata table, taxonomy table and reference sequences. I want to run anova (function adonis, package vegan) with the data and I didn't find many problems doing so, the commands worked well. However, I'm not sure if I'm doing the right way. I was told that I have to scale my data, since I have some variable columns that go from 0 to 7 at the same time that I have a column for altitude, that reaches up to 2700m, for example. In that same table I have Y/N data, name of species and infection load (values from 0.01 to 456), so different kinds of data and information. Should I scale the entire data or do I need to separate my numeric data in one table and the rest in another table and then combine them when the numeric information is scaled? Or scale in this case is not really necessary?
I also have another question, as I said I have different species - 3 more precisely - one of them has 25 samples and the other two have 157 and 115. Since adonis works with permutations, do I need to adjust the amount of samples somehow or is it already covered by the permutation step?

Thanks in advance!

Scale only the continuous data so long as they don't all have the same units, in which case you don't need to scale. Don't scale binary and categorical data.

For the second question, could you post a reproducible example, called a reprex?

@technocrat thanks for the feedback!
In the computer I'm working right now, I don't have all my data, I'll get my hands on it next week and then I can send a reprex :slight_smile:
About the scaling, I'm pasting here the structure of my metadata to show my variables:

Here is clear that I have different units: meters for altitude; lati-long; classification from 0-7 for tourism and from 0-4 for invasive spp and livestock; genome equivalent for infection load and Y-N for infection presence. You said that I should scale only the continuous variables, it means infect_load/ tourism/invas_spp/livestock and altitude? Sorry, but it's still not so clear for me. In this case, I should separate all those columns, apply scale and them combine them again in a joint table? And additionally, should I also separate my sample+ spp table from my sample+ variables table?

Thanks again for the help!!

1 Like

I jumped the gun, somewhat. My advice about scaling was a general rule of thumb that doesn't have necessary applicability within modeling. (Which I'd have noticed had I been more careful.)

There's a thread on normalization of frequency data as part of doing analyses that can be done with the types of approaches in vegan::adonis. The use of the term anova in that domain differs from the vanilla variety.

There is a link to a paper, Statistical methods for temporal and space–time analysis of community composition data you may want to take a look at, since it appears that your data has both temporal and spatial characteristics.

And, wouldn't you know, there's a package to go with that!

1 Like

Thanks a lot! I'll definitely check all of them!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.