Error: Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : NA/NaN/Inf in 'x'

Neha19685 · September 4, 2020, 12:19am

I am working on this dataset-

head(pone.data, 10)
##     DATE TOTALDEATHS   CO  NO2   O3 SO2   PCNT PM25 RHUM TMP
## 1  01-Jan-11          52 0.22 32.0 23.2 1.4 12.004   17 86.0 4.6
## 2  02-Jan-11          41 0.24 39.8 25.4 1.5 13.662   10 80.5 1.7

And I tried to fit a linear model.

# Linear Model
X.temp = pone.data[, 'TMP']-mean(pone.data[, 'TMP'])
X2.temp = X.temp^2

X.hum = pone.data[, 'RHUM']-mean(pone.data[, 'RHUM'])
X2.hum = X.hum^2

pone.data$TOTALDEATHS <- as.numeric(gsub("\\.", "", pone.data$TOTALDEATHS)) 

pone.data$CO <- as.numeric(gsub("\\.", "", pone.data$CO))

trend = time(pone.data[, 'TOTALDEATHS'])

fit= lm(pone.data[, 'TOTALDEATHS'] ~ trend +
          X.temp + X2.temp + X.hum + X2.hum +
          pone.data[, 'CO'], na.action = NULL)

But I am getting the error-

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : NA/NaN/Inf in 'x'

I have NA values in my dataset.
Now, What should I do?

elmstedt · September 4, 2020, 3:33am

First, you did a nice job asking your question including the error and the command which threw it as well as a snippet of your data.

Here is a reproducible example though which I think will help as you can run the code and play with it yourself,

n <- 10
p <- 3
set.seed(123)
df <- data.frame(matrix(sample(4, n * p, TRUE), nrow = n, dimnames = list(NULL, c("y", "x1", "x2"))))
df[sample(n, 5), "x2"] <- NA
df
#>    y x1 x2
#> 1  3  4 NA
#> 2  3  2  4
#> 3  3  2 NA
#> 4  2  1 NA
#> 5  3  2  1
#> 6  2  3 NA
#> 7  2  4  4
#> 8  2  1  2
#> 9  3  3 NA
#> 10 1  3  2

So, we've made some data with NA values. Let's see what happens when we try to produce models from the data using a variety of na.action choices.

It's worth noting, the default is na.omit, so you should ahve a defensible reason for choosing something else before you do.

(m0 <- lm(y ~ x1 + x2, df))
#> 
#> Call:
#> lm(formula = y ~ x1 + x2, data = df)
#> 
#> Coefficients:
#> (Intercept)           x1           x2  
#>      2.5811      -0.3784       0.2027

(m1 <- lm(y ~ x1 + x2, df, na.action = "na.omit"))
#> 
#> Call:
#> lm(formula = y ~ x1 + x2, data = df, na.action = "na.omit")
#> 
#> Coefficients:
#> (Intercept)           x1           x2  
#>      2.5811      -0.3784       0.2027

(m2 <- lm(y ~ x1 + x2, df, na.action = "na.exclude"))
#> 
#> Call:
#> lm(formula = y ~ x1 + x2, data = df, na.action = "na.exclude")
#> 
#> Coefficients:
#> (Intercept)           x1           x2  
#>      2.5811      -0.3784       0.2027

You'll notice the three results are the same, because again, "na.omit" is the default and "na.exclude" does the same thing, though it does a better job of keeping track of what happened as we'll see next.

fitted(m1)
#>        2        5        7        8       10 
#> 2.635135 2.027027 1.878378 2.608108 1.851351
fitted(m2)
#>        1        2        3        4        5        6        7        8 
#>       NA 2.635135       NA       NA 2.027027       NA 1.878378 2.608108 
#>        9       10 
#>       NA 1.851351

You can see here "na.exclude" kept track of which observations were problematic and has NA's for the fitted values (the residuals as well).

The rest produce errors for different reasons. The first, "na.fail" is hopefully not too hard to understand why. It tells R to throw an error if the data contains any NA values.

m3 <- lm(y ~ x1 + x2, df, na.action = "na.fail")
#> Error in na.fail.default(structure(list(y = c(3L, 3L, 3L, 2L, 3L, 2L, : missing values in object

Finally, choosing NULL I believe has the same effect as choosing "na.pass" which causes lm() to simply do what you asked, no more, no less... it does no pre-processing on the data and happily throws errors when the computations fail because of the NA's.

m4 <- lm(y ~ x1 + x2, df, na.action = NULL)
#> Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...): NA/NaN/Inf in 'x'
m5 <- lm(y ~ x1 + x2, df, na.action = "na.pass")
#> Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...): NA/NaN/Inf in 'x'

^{Created on 2020-09-03 by the reprex package (v0.3.0)}

Neha19685 · September 4, 2020, 2:42pm

Thanks, @elmstedt for the detailed solution, it resolved my problem.

elmstedt · September 4, 2020, 2:56pm

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

system · September 25, 2020, 2:56pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.