Missing data with nlme

matccarp · June 20, 2019, 4:59pm

Good afternoon,

I have been having a problem for quite sometime now with dealing with some missing data in my code. A lot of people I have talked to tell me to simply put "na.strings="."" in the beginging of the code and put "."s in the missing spots. I have done this and I still cannot seem to get it to run.

Data:

Year	Plot	DateTreatment	Rep	Treat	0WAT	1WAT	2WAT	3WAT
2019	51	Apr	1	Controlled	9	9	9	8
2019	52	Apr	1	Less Fraze	1	5	8	8
2019	53	Apr	1	Scalped	2	4	6	5
2019	54	Apr	1	Deep Fraze	1	1	1	2
2019	55	Apr	1	Gly	9	8	7	6
2019	56	Apr	1	Rim	8	8	8	7
2019	61	Apr	2	Less Fraze	1	6	8	8
2019	62	Apr	2	Deep Fraze	1	2	4	6
2019	63	Apr	2	Scalped	1	3	7	7
2019	64	Apr	2	Gly	.	.	.	.
2019	65	Apr	2	Controlled	9	9	9	9
2019	66	Apr	2	Rim	8	9	7	6
2019	71	Apr	3	Gly	9	7	7	5
2019	72	Apr	3	Deep Fraze	1	3	3	5
2019	73	Apr	3	Less Fraze	2	6	7	8
2019	74	Apr	3	Scalped	3	5	7	6
2019	75	Apr	3	Rim	9	9	7	7
2019	76	Apr	3	Controlled	9	8	8	8
2019	81	Apr	4	Gly	9	9	7	7
2019	82	Apr	4	Controlled	9	9	9	9
2019	83	Apr	4	Deep Fraze	1	2	3	4
2019	84	Apr	4	Rim	9	9	7	7
2019	85	Apr	4	Less Fraze	1	4	7	7
2019	86	Apr	4	Scalped	2	5	5	5
2019	91	May	1	Deep Fraze	1	1	2	3
2019	92	May	1	Scalped	5	4	8	6
2019	93	May	1	Gly	.	.	.	.
2019	94	May	1	Controlled	8	8	8	8
2019	95	May	1	Rim	8	8	7	6
2019	96	May	1	Less Fraze	1	1	4	4
2019	101	May	2	Controlled	8	9	8	7
2019	102	May	2	Less Fraze	2	1	4	6
2019	103	May	2	Rim	9	9	7	6
2019	104	May	2	Scalped	2	3	5	6
2019	105	May	2	Gly	8	6	3	4
2019	106	May	2	Deep Fraze	1	1	2	3
2019	111	May	3	Scalped	5	4	5	6
2019	112	May	3	Rim	8	8	7	6
2019	113	May	3	Gly	8	5	2	2
2019	114	May	3	Controlled	9	9	9	9
2019	115	May	3	Deep Fraze	1	1	2	2
2019	116	May	3	Less Fraze	1	1	6	4
2019	121	May	4	Less Fraze	1	2	5	6
2019	122	May	4	Gly	9	4	2	2
2019	123	May	4	Rim	9	8	7	5
2019	124	May	4	Controlled	9	9	8	8
2019	125	May	4	Deep Fraze	1	3	6	5
2019	126	May	4	Scalped	4	5	6	5

Code:
setwd("/Users/mc1499/Documents/Thesis Measurements/2018 & 2019 Spring")
dat<-read.csv("TQR pooled.csv")
head(dat)
na.strings="."
block<-as.factor(dat$Rep)
trt<-as.factor(dat$Treat)
trtdate<-as.factor(dat$DateTreatment)
yr<-as.factor(dat$Year)
str(dat)

RCI0WAT<-lme(X0WAT ~ Treat*DateTreatment,random = ~1|Year/Rep,data=dat)
anova(RCI0WAT)
predictmeans(RCI0WAT,"Treat:DateTreatment",pairwise=TRUE)

Output:

RCI0WAT<-lme(X0WAT ~ Treat*DateTreatment,random = ~1|Year/Rep,data=dat)
Error in rownames<-(*tmp*, value = rownames(Fitted) <- origOrder) : attempt to set 'rownames' on an object with no dimensions In addition: Warning message: In Ops.factor(y[revOrder], Fitted) : ‘-’ not meaningful for factors

How do I make this run? When I take away the "."s and add any numbers it runs and gives me the results.

aosmith · June 20, 2019, 5:32pm

na.strings is an argument to the read.csv() function. I often find it useful to read about an argument I haven't used before, which you can do by going to the help page by running ?read.csv in your R Console.

If your NA values are represented by ., you can change your code to

dat <- read.csv("TQR pooled.csv", na.strings = ".")

Then when you look at summary(dat) or View(dat) you should see that you have NA values in some columns.

matccarp · June 20, 2019, 6:44pm

I have tried moving the na.strings to that location before this is what I get when I do that.

RCI0WAT<-lme(X0WAT ~ Treat*DateTreatment,random = ~1|Year/Rep,data=dat)
Error in na.fail.default(list(X0WAT = c(298L, 281L, 111L, 202L, 303L, : missing values in object

When I put in summary(dat) this is one of the weeks I get:
X3WAT Min. : 92.0 1st Qu.:230.0 Median :301.0 Mean :313.7 3rd Qu.:387.0 Max. :615.0 NA's :1

aosmith · June 20, 2019, 7:44pm

Good, having the NA present and the lme() error message means the first step is done!

Unlike other objects (like lm objects), lme objects are set to "fail" if NA values are present by default. This forces the user to decide what we want to do with them.

You can see in the documentation for lme():

na.action
a function that indicates what should happen when the data contain NAs. The default action (na.fail) causes lme to print an error message and terminate if there are any incomplete observations.

Other commonly used options (see ?na.action) are na.omit, which is what you could be used to lm models using and na.exclude. These two options do slightly different things, so which is most useful will depend on your goals but both will stop the current error message.

RCI0WAT <- lme(X0WAT ~ Treat*DateTreatment,
        random = ~1|Year/Rep,
        na.action = na.omit,
        data = dat )

Another option we always have is to manually remove the NA. For example, if we want to remove all rows that have an NA anywhere (which is not always desired), you could do something like

RCI0WAT <- lme(X0WAT ~ Treat*DateTreatment,
        random = ~1|Year/Rep,
        data = na.omit(dat) )

matccarp · June 20, 2019, 8:31pm

Awesome! I got the data = na.omit(dat) to run however the na.fail = na.omit, data = dat gives an error. It still proceeds to run the anova and predictmeans but I don't know what the problem is with the error.

Output:
RCI0WAT<-lme(X0WAT ~ Treat*DateTreatment,random = ~1|Year/Rep, na.fail=na.omit, data=dat)
Error in lme(X0WAT ~ Treat * DateTreatment, random = ~1 | Year/Rep, na.fail = na.omit, : unused argument (na.fail = na.omit)

Can you explain what is getting removed when I use the data=na.omit(data) vs. na.fail=na.omit, data=dat?

aosmith · June 20, 2019, 8:48pm

That's because it's supposed to be na.action = na.omit and not na.fail. Fixed above.

matccarp · June 20, 2019, 8:57pm

Ah, I see that. That was a dumb question.

I have a couple more not as dumb questions I think

What is the difference between the data=na.omit and na.action=na.omit, data=dat?
This will work when I have missing data points lined up like this correct?

Year	Plot	SeedingDate	Rep	Treat	0WAS	2WAS	3WAS
2017	11	Sep	1	Controlled	.	.	54.070625
2017	12	Sep	1	Deep Fraze	.	.	83.72182292
2017	13	Sep	1	Verticut	.	.	56.20604167
2017	14	Sep	1	Less Fraze	.	.	39.4509375
2017	15	Sep	1	Scalped	.	.	52.39859375
2017	21	Sep	2	Less Fraze	.	.	79.58875
2017	22	Sep	2	Deep Fraze	.	.	74.1265625
2017	23	Sep	2	Verticut	.	.	79.23822917
2017	24	Sep	2	Scalped	.	.	74.97447917
2017	25	Sep	2	Controlled	.	.	51.79739583
2017	31	Sep	3	Scalped	.	.	93.17848958
2017	32	Sep	3	Less Fraze	.	.	47.21385417
2017	33	Sep	3	Controlled	.	.	76.4871875
2017	34	Sep	3	Deep Fraze	.	.	66.2246875
2017	35	Sep	3	Verticut	.	.	48.22385417
2017	41	Sep	4	Scalped	.	.	85.97604167
2017	42	Sep	4	Controlled	.	.	80.46276042
2017	43	Sep	4	Less Fraze	.	.	67.17083333
2017	44	Sep	4	Deep Fraze	.	.	60.97229167
2017	45	Sep	4	Verticut	.	.	50.29697917
2017	51	Oct	1	Verticut	20.40994792	32.86390625	39.86911458
2017	52	Oct	1	Less Fraze	7.35125	32.61114583	37.4709375
2017	53	Oct	1	Controlled	22.96963542	36.93651042	28.13942708

Lastly, how can I find the %CV, SD, or SEM? (I need these to make a graph)

aosmith · June 20, 2019, 9:31pm

The big difference, which may not be relevant in your particular case, is that using the na.omit() function on the entire dataset removes any row that has a missing value in it anywhere, even if you are not using that variable in themodel. Using the na.action = na.omit only removes rows for the variables you are using in your model.

Here's a toy example dataset.

dat1 = data.frame(x = c(1:4),
                 y = c(3, 3, 4, 5),
                 x2 = c(1, 4, 5, NA))

I want to regress y vs x but I have a second variable, x2, that has a missing value in it.

Using the na.omit() function, the analysis is only done on three rows since the last row is removed due to the missing value in x2.

summary(lm(y ~ x, data = na.omit(dat1)))
Call:
lm(formula = y ~ x, data = na.omit(dat1))

Residuals:
      1       2       3 
 0.1667 -0.3333  0.1667 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   2.3333     0.6236   3.742    0.166
x             0.5000     0.2887   1.732    0.333

Residual standard error: 0.4082 on 1 degrees of freedom
Multiple R-squared:   0.75,	Adjusted R-squared:    0.5 
F-statistic:     3 on 1 and 1 DF,  p-value: 0.3333

Using na.action = na.omit all four rows are used.

summary(lm(y ~ x, data = dat1, na.action = na.omit))

Call:
lm(formula = y ~ x, data = dat1, na.action = na.omit)

Residuals:
   1    2    3    4 
 0.3 -0.4 -0.1  0.2 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   2.0000     0.4743   4.216   0.0519 .
x             0.7000     0.1732   4.041   0.0561 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3873 on 2 degrees of freedom
Multiple R-squared:  0.8909,	Adjusted R-squared:  0.8364 
F-statistic: 16.33 on 1 and 2 DF,  p-value: 0.05612

For your last question about how to calculate specific statistical results from mixed models fit with lme(), I'd recommend asking a new question. Make sure to include a reproducible example so folks can help you. See this FAQ on how to include a reproducible example.

matccarp · June 20, 2019, 9:50pm

Got it!

These functions will still work even when there is a big area of missing data (like in the data attached to the last question), correct?

And again thank you!

aosmith · June 20, 2019, 9:57pm

Yes! Those both work no matter how many missing values you have (although if your data get sparse enough you could have trouble modeling it ).

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

matccarp · June 20, 2019, 10:13pm

Thank you so very much!!!!

system · June 27, 2019, 10:13pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.