Missing data with nlme

aosmith · June 20, 2019, 9:31pm

The big difference, which may not be relevant in your particular case, is that using the na.omit() function on the entire dataset removes any row that has a missing value in it anywhere, even if you are not using that variable in themodel. Using the na.action = na.omit only removes rows for the variables you are using in your model.

Here's a toy example dataset.

dat1 = data.frame(x = c(1:4),
                 y = c(3, 3, 4, 5),
                 x2 = c(1, 4, 5, NA))

I want to regress y vs x but I have a second variable, x2, that has a missing value in it.

Using the na.omit() function, the analysis is only done on three rows since the last row is removed due to the missing value in x2.

summary(lm(y ~ x, data = na.omit(dat1)))
Call:
lm(formula = y ~ x, data = na.omit(dat1))

Residuals:
      1       2       3 
 0.1667 -0.3333  0.1667 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   2.3333     0.6236   3.742    0.166
x             0.5000     0.2887   1.732    0.333

Residual standard error: 0.4082 on 1 degrees of freedom
Multiple R-squared:   0.75,	Adjusted R-squared:    0.5 
F-statistic:     3 on 1 and 1 DF,  p-value: 0.3333

Using na.action = na.omit all four rows are used.

summary(lm(y ~ x, data = dat1, na.action = na.omit))

Call:
lm(formula = y ~ x, data = dat1, na.action = na.omit)

Residuals:
   1    2    3    4 
 0.3 -0.4 -0.1  0.2 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   2.0000     0.4743   4.216   0.0519 .
x             0.7000     0.1732   4.041   0.0561 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3873 on 2 degrees of freedom
Multiple R-squared:  0.8909,	Adjusted R-squared:  0.8364 
F-statistic: 16.33 on 1 and 2 DF,  p-value: 0.05612

For your last question about how to calculate specific statistical results from mixed models fit with lme(), I'd recommend asking a new question. Make sure to include a reproducible example so folks can help you. See this FAQ on how to include a reproducible example.