Can I Run ANOVA with 2 Columns of Data?

Terrible_Coder · March 12, 2020, 8:52pm

I know that ANOVA has to have one dependent variable and at least one independent variable in order to function properly. However, my supervisor wants me to use data that only has one variable (see picture). Is it possible to run ANOVA with the limited info? If so, how?

If it can't, how can I make the data work? Thank you in advance!!

joels · March 12, 2020, 9:09pm

This would be a one-way ANOVA where you're looking at whether Shannon Index (the dependent variable) differs by Site.

Here's an example with the built-in iris data frame where we look at how Petal.Width differs by Species:

model1 = aov(Petal.Width ~ Species, data=iris)

A one-way ANOVA is equivalent to a linear regression model with a single categorical predictor. The code for that is:

model2 = lm(Petal.Width ~ Species, data=iris)

Terrible_Coder · March 12, 2020, 9:58pm

So my data can work for ANOVA, and the proof to that is from the examples in model1 and model2

Jason.C · March 16, 2020, 5:08pm

Thanks for the response to this question. I was wondering the same thing. However, I have a few additional questions.
My data:

3 Groups
4 continuous variables

Basically I want to test for group mean differences. I dropped one group and used the suggestions above but struggled to identify the

df
p-value of the model
p-value of individual predictors
beta weight (second example)

Any help with these questions would be greatly appreciated.

Cheers,
Jason

joels · March 16, 2020, 6:11pm

You don't need to drop any groups from the model. R will take care of creating dummy values for categorical independent variables. Here's an example, using the built-in iris data frame:

# Null model (model with intercept only)
m1 = lm(Petal.Width ~ 1, data=iris)

# Model with predictor variables
m2 = lm(Petal.Width ~ Species + Sepal.Length + Sepal.Width, data=iris)

There are various ways to summarize the model. For example,

summary(m2)

Call:
lm(formula = Petal.Width ~ Species + Sepal.Length + Sepal.Width, 
    data = iris)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.50805 -0.10042 -0.01221  0.11416  0.46455 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)       -0.86897    0.16985  -5.116 9.73e-07 ***
Speciesversicolor  1.17375    0.06758  17.367  < 2e-16 ***
Speciesvirginica   1.78487    0.07779  22.944  < 2e-16 ***
Sepal.Length       0.06360    0.03395   1.873    0.063 .  
Sepal.Width        0.23237    0.05145   4.516 1.29e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1797 on 145 degrees of freedom
Multiple R-squared:  0.9459,	Adjusted R-squared:  0.9444 
F-statistic: 634.3 on 4 and 145 DF,  p-value: < 2.2e-16

summary provides the coefficient values (in the Estimate column) and the p-values for the individual predictors (in the Pr(>|t|) column). Also, note that setosa is the excluded category for Species, meaning that you get the model prediction for Species="setosa" by setting the dummies equal to zero for versicolor and virginica.

In addition, the last row of the summary provides all of the F-test information, which is a joint test that all of the coefficents are zero. It gives the F-statistics, the degrees of freedom for the test, and p-value for the test.

Other ways to perform the F-test include:

anova(m1, m2, test="F") # F-test results

# F-test on any combination of independent variables (we include all four here)
car::linearHypothesis(m2, c("Sepal.Length=0",      
                            "Sepal.Width=0", 
                            "Speciesversicolor=0",
                            "Speciesvirginica=0"))

By "beta weight" do you mean the standardized regression coefficients? If so, you can scale the continuous variables so that the regression coefficients will be in units of standard deviations. For example:

m2s = lm(scale(Petal.Width) ~ Species + scale(Sepal.Length) + scale(Sepal.Width), data=iris)

where the scale function takes the original data values subtracts the mean and divides by the standard deviation. For example:

x = 0:10
scale(x)

# Compare with
(x - mean(x))/sd(x)

Jason.C · March 16, 2020, 8:11pm

Thank you this is very helpful!

Cheers,
Jason

Jason.C · March 17, 2020, 12:51am

Ok this was helpful from a predictive perspective. Would I do a simple group comparison of several continuous variables the same way?

EDIT 1: As of, March 17, 2020, I think the answer is that a separate ANOVA is needed to estimate group effects for each of the 4 outcome measures listed above.

EDIT 2: FYI: this link has a pretty good walk through of ANOVA

EDIT 3: This was an extremely timely and very useful post.

Cheers,
Jason

system · April 7, 2020, 1:01am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.