The big difference, which may not be relevant in your particular case, is that using the na.omit()
function on the entire dataset removes any row that has a missing value in it anywhere, even if you are not using that variable in themodel. Using the na.action = na.omit
only removes rows for the variables you are using in your model.
Here's a toy example dataset.
dat1 = data.frame(x = c(1:4),
y = c(3, 3, 4, 5),
x2 = c(1, 4, 5, NA))
I want to regress y
vs x
but I have a second variable, x2
, that has a missing value in it.
Using the na.omit()
function, the analysis is only done on three rows since the last row is removed due to the missing value in x2
.
summary(lm(y ~ x, data = na.omit(dat1)))
Call:
lm(formula = y ~ x, data = na.omit(dat1))
Residuals:
1 2 3
0.1667 -0.3333 0.1667
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.3333 0.6236 3.742 0.166
x 0.5000 0.2887 1.732 0.333
Residual standard error: 0.4082 on 1 degrees of freedom
Multiple R-squared: 0.75, Adjusted R-squared: 0.5
F-statistic: 3 on 1 and 1 DF, p-value: 0.3333
Using na.action = na.omit
all four rows are used.
summary(lm(y ~ x, data = dat1, na.action = na.omit))
Call:
lm(formula = y ~ x, data = dat1, na.action = na.omit)
Residuals:
1 2 3 4
0.3 -0.4 -0.1 0.2
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.0000 0.4743 4.216 0.0519 .
x 0.7000 0.1732 4.041 0.0561 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3873 on 2 degrees of freedom
Multiple R-squared: 0.8909, Adjusted R-squared: 0.8364
F-statistic: 16.33 on 1 and 2 DF, p-value: 0.05612
For your last question about how to calculate specific statistical results from mixed models fit with lme()
, I'd recommend asking a new question. Make sure to include a reproducible example so folks can help you. See this FAQ on how to include a reproducible example.