Missing data with nlme

Good afternoon,

I have been having a problem for quite sometime now with dealing with some missing data in my code. A lot of people I have talked to tell me to simply put "na.strings="."" in the beginging of the code and put "."s in the missing spots. I have done this and I still cannot seem to get it to run.

Data:

Year Plot DateTreatment Rep Treat 0WAT 1WAT 2WAT 3WAT
2019 51 Apr 1 Controlled 9 9 9 8
2019 52 Apr 1 Less Fraze 1 5 8 8
2019 53 Apr 1 Scalped 2 4 6 5
2019 54 Apr 1 Deep Fraze 1 1 1 2
2019 55 Apr 1 Gly 9 8 7 6
2019 56 Apr 1 Rim 8 8 8 7
2019 61 Apr 2 Less Fraze 1 6 8 8
2019 62 Apr 2 Deep Fraze 1 2 4 6
2019 63 Apr 2 Scalped 1 3 7 7
2019 64 Apr 2 Gly . . . .
2019 65 Apr 2 Controlled 9 9 9 9
2019 66 Apr 2 Rim 8 9 7 6
2019 71 Apr 3 Gly 9 7 7 5
2019 72 Apr 3 Deep Fraze 1 3 3 5
2019 73 Apr 3 Less Fraze 2 6 7 8
2019 74 Apr 3 Scalped 3 5 7 6
2019 75 Apr 3 Rim 9 9 7 7
2019 76 Apr 3 Controlled 9 8 8 8
2019 81 Apr 4 Gly 9 9 7 7
2019 82 Apr 4 Controlled 9 9 9 9
2019 83 Apr 4 Deep Fraze 1 2 3 4
2019 84 Apr 4 Rim 9 9 7 7
2019 85 Apr 4 Less Fraze 1 4 7 7
2019 86 Apr 4 Scalped 2 5 5 5
2019 91 May 1 Deep Fraze 1 1 2 3
2019 92 May 1 Scalped 5 4 8 6
2019 93 May 1 Gly . . . .
2019 94 May 1 Controlled 8 8 8 8
2019 95 May 1 Rim 8 8 7 6
2019 96 May 1 Less Fraze 1 1 4 4
2019 101 May 2 Controlled 8 9 8 7
2019 102 May 2 Less Fraze 2 1 4 6
2019 103 May 2 Rim 9 9 7 6
2019 104 May 2 Scalped 2 3 5 6
2019 105 May 2 Gly 8 6 3 4
2019 106 May 2 Deep Fraze 1 1 2 3
2019 111 May 3 Scalped 5 4 5 6
2019 112 May 3 Rim 8 8 7 6
2019 113 May 3 Gly 8 5 2 2
2019 114 May 3 Controlled 9 9 9 9
2019 115 May 3 Deep Fraze 1 1 2 2
2019 116 May 3 Less Fraze 1 1 6 4
2019 121 May 4 Less Fraze 1 2 5 6
2019 122 May 4 Gly 9 4 2 2
2019 123 May 4 Rim 9 8 7 5
2019 124 May 4 Controlled 9 9 8 8
2019 125 May 4 Deep Fraze 1 3 6 5
2019 126 May 4 Scalped 4 5 6 5

Code:
setwd("/Users/mc1499/Documents/Thesis Measurements/2018 & 2019 Spring")
dat<-read.csv("TQR pooled.csv")
head(dat)
na.strings="."
block<-as.factor(dat$Rep)
trt<-as.factor(dat$Treat)
trtdate<-as.factor(dat$DateTreatment)
yr<-as.factor(dat$Year)
str(dat)

RCI0WAT<-lme(X0WAT ~ Treat*DateTreatment,random = ~1|Year/Rep,data=dat)
anova(RCI0WAT)
predictmeans(RCI0WAT,"Treat:DateTreatment",pairwise=TRUE)

Output:

RCI0WAT<-lme(X0WAT ~ Treat*DateTreatment,random = ~1|Year/Rep,data=dat)
Error in rownames<-(*tmp*, value = rownames(Fitted) <- origOrder) : attempt to set 'rownames' on an object with no dimensions In addition: Warning message: In Ops.factor(y[revOrder], Fitted) : β€˜-’ not meaningful for factors

How do I make this run? When I take away the "."s and add any numbers it runs and gives me the results.

na.strings is an argument to the read.csv() function. I often find it useful to read about an argument I haven't used before, which you can do by going to the help page by running ?read.csv in your R Console.

If your NA values are represented by ., you can change your code to

dat <- read.csv("TQR pooled.csv", na.strings = ".")

Then when you look at summary(dat) or View(dat) you should see that you have NA values in some columns.

I have tried moving the na.strings to that location before this is what I get when I do that.

RCI0WAT<-lme(X0WAT ~ Treat*DateTreatment,random = ~1|Year/Rep,data=dat)
Error in na.fail.default(list(X0WAT = c(298L, 281L, 111L, 202L, 303L, : missing values in object

When I put in summary(dat) this is one of the weeks I get:
X3WAT Min. : 92.0 1st Qu.:230.0 Median :301.0 Mean :313.7 3rd Qu.:387.0 Max. :615.0 NA's :1

Good, having the NA present and the lme() error message means the first step is done!

Unlike other objects (like lm objects), lme objects are set to "fail" if NA values are present by default. This forces the user to decide what we want to do with them.

You can see in the documentation for lme():

na.action
a function that indicates what should happen when the data contain NAs. The default action (na.fail) causes lme to print an error message and terminate if there are any incomplete observations.

Other commonly used options (see ?na.action) are na.omit, which is what you could be used to lm models using and na.exclude. These two options do slightly different things, so which is most useful will depend on your goals but both will stop the current error message.

RCI0WAT <- lme(X0WAT ~ Treat*DateTreatment,
        random = ~1|Year/Rep,
        na.action = na.omit,
        data = dat )

Another option we always have is to manually remove the NA. For example, if we want to remove all rows that have an NA anywhere (which is not always desired), you could do something like

RCI0WAT <- lme(X0WAT ~ Treat*DateTreatment,
        random = ~1|Year/Rep,
        data = na.omit(dat) )

Awesome! I got the data = na.omit(dat) to run however the na.fail = na.omit, data = dat gives an error. It still proceeds to run the anova and predictmeans but I don't know what the problem is with the error.

Output:
RCI0WAT<-lme(X0WAT ~ Treat*DateTreatment,random = ~1|Year/Rep, na.fail=na.omit, data=dat)
Error in lme(X0WAT ~ Treat * DateTreatment, random = ~1 | Year/Rep, na.fail = na.omit, : unused argument (na.fail = na.omit)

Can you explain what is getting removed when I use the data=na.omit(data) vs. na.fail=na.omit, data=dat?

That's because it's supposed to be na.action = na.omit and not na.fail. :stuck_out_tongue: Fixed above.

Ah, I see that. That was a dumb question.

I have a couple more not as dumb questions I think :crossed_fingers:

  1. What is the difference between the data=na.omit and na.action=na.omit, data=dat?

  2. This will work when I have missing data points lined up like this correct?

Year Plot SeedingDate Rep Treat 0WAS 2WAS 3WAS
2017 11 Sep 1 Controlled . . 54.070625
2017 12 Sep 1 Deep Fraze . . 83.72182292
2017 13 Sep 1 Verticut . . 56.20604167
2017 14 Sep 1 Less Fraze . . 39.4509375
2017 15 Sep 1 Scalped . . 52.39859375
2017 21 Sep 2 Less Fraze . . 79.58875
2017 22 Sep 2 Deep Fraze . . 74.1265625
2017 23 Sep 2 Verticut . . 79.23822917
2017 24 Sep 2 Scalped . . 74.97447917
2017 25 Sep 2 Controlled . . 51.79739583
2017 31 Sep 3 Scalped . . 93.17848958
2017 32 Sep 3 Less Fraze . . 47.21385417
2017 33 Sep 3 Controlled . . 76.4871875
2017 34 Sep 3 Deep Fraze . . 66.2246875
2017 35 Sep 3 Verticut . . 48.22385417
2017 41 Sep 4 Scalped . . 85.97604167
2017 42 Sep 4 Controlled . . 80.46276042
2017 43 Sep 4 Less Fraze . . 67.17083333
2017 44 Sep 4 Deep Fraze . . 60.97229167
2017 45 Sep 4 Verticut . . 50.29697917
2017 51 Oct 1 Verticut 20.40994792 32.86390625 39.86911458
2017 52 Oct 1 Less Fraze 7.35125 32.61114583 37.4709375
2017 53 Oct 1 Controlled 22.96963542 36.93651042 28.13942708
  1. Lastly, how can I find the %CV, SD, or SEM? (I need these to make a graph)

The big difference, which may not be relevant in your particular case, is that using the na.omit() function on the entire dataset removes any row that has a missing value in it anywhere, even if you are not using that variable in themodel. Using the na.action = na.omit only removes rows for the variables you are using in your model.

Here's a toy example dataset.

dat1 = data.frame(x = c(1:4),
                 y = c(3, 3, 4, 5),
                 x2 = c(1, 4, 5, NA))

I want to regress y vs x but I have a second variable, x2, that has a missing value in it.

Using the na.omit() function, the analysis is only done on three rows since the last row is removed due to the missing value in x2.

summary(lm(y ~ x, data = na.omit(dat1)))
Call:
lm(formula = y ~ x, data = na.omit(dat1))

Residuals:
      1       2       3 
 0.1667 -0.3333  0.1667 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   2.3333     0.6236   3.742    0.166
x             0.5000     0.2887   1.732    0.333

Residual standard error: 0.4082 on 1 degrees of freedom
Multiple R-squared:   0.75,	Adjusted R-squared:    0.5 
F-statistic:     3 on 1 and 1 DF,  p-value: 0.3333

Using na.action = na.omit all four rows are used.

summary(lm(y ~ x, data = dat1, na.action = na.omit))

Call:
lm(formula = y ~ x, data = dat1, na.action = na.omit)

Residuals:
   1    2    3    4 
 0.3 -0.4 -0.1  0.2 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   2.0000     0.4743   4.216   0.0519 .
x             0.7000     0.1732   4.041   0.0561 .
---
Signif. codes:  0 β€˜***’ 0.001 β€˜**’ 0.01 β€˜*’ 0.05 β€˜.’ 0.1 β€˜ ’ 1

Residual standard error: 0.3873 on 2 degrees of freedom
Multiple R-squared:  0.8909,	Adjusted R-squared:  0.8364 
F-statistic: 16.33 on 1 and 2 DF,  p-value: 0.05612

For your last question about how to calculate specific statistical results from mixed models fit with lme(), I'd recommend asking a new question. Make sure to include a reproducible example so folks can help you. See this FAQ on how to include a reproducible example.

2 Likes

Got it!

These functions will still work even when there is a big area of missing data (like in the data attached to the last question), correct?

And again thank you!

Yes! Those both work no matter how many missing values you have (although if your data get sparse enough you could have trouble modeling it :slightly_smiling_face:).

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

Thank you so very much!!!!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.