The first step with this or any other project is to survey the data. How many observations? What the unit? A single individual or a population? How many variables are being observed? Are they categorical (presence or absence of a type) or quantitative (numbers of organisms)? What are their relative numbers? If quantitative, what are the comparative min, max, mean and median? What does the distribution look like in a histogram or density plot? Are different organisms correlated?

Once you've explored the data and are ready to model it, what's the response variable? If it's categorical, you'll probably try logistic regression; if qualitative, ordinary least squares. Decide in advance what p-value that you are aiming for? If the number of variables is larger than the number of observations, you may need to do dimensional reduction through methods such as principal component analysis.

All of which is to say, there's no one right way. It would be a good idea to take a look at the literature on microbiomes and see what approaches have been used successfully.