How to ignore failed linear regression in a big dataset?

I'm running a "lm" regression to a large dataset by groups. The problem I'm facing is that for the groups where the analysis failed for reasons such as: no variation in the data, or the levels of one of the factors is = 1, or missing values ... it totally stops the creation of my output dataframe. Can anyone help me by suggesting a solution to this problem?

My datasets consists on 4 columns :
Y = response variable
X1 : 1st explanatory variable
X2 : 2nd explanatory variable
Location : is the column to group my data, so "lm" runs by Location. Here is the code I'm using to get Predicted and Srudentized-residuals by observation:

library(tidyverse)
library(broom)

output <- as.data.frame(dataset)
output$Y <- as.numeric(output$Y)
output$X1 <- as.factor(output$X1)
output$X2 <- as.factor(output$X2)

df <- output %>%
group_by(location) %>% 
do(cbind(location = .$location, lm(Y ~ X1 + X2, data = .) %>% augment))

when I run this code on some locations that I know previously are ok, all works fine, but once I run it on all locations, I got an error message. Hope this clarified my issue. Thanks again !

You can use tryCatch(). Here is a simple example,

y <- runif(1)
x <- 2i
tryCatch(
  lm(y ~ x),
  warning = function(c)
    print("Got a warning"),
  error = function(c)
    print("Got an Error")
)
1 Like

Hi @startz ,
Thanks a lot for your reply !
In fact, using your recomendation, I got my original file without any other output as expected from "lm".
any idea what I did wrong?
Thanks again !

Try explicitly printing lm results.

Add @startz 's suggestion for exception handling

d <- mtcars[,c(1:2,5:6)]

fours <- which(d$cyl == 4)
sixes <- which(d$cyl == 6)
eights <- which(d$cyl == 8)

groups <- list(fours,sixes,eights)

get_fits <- function(x) list(lm(mpg ~ drat + wt,d[unlist(groups[x]),]))

l <- list()
for(i in seq_along(groups)) l[i] = get_fits(i)

r <- matrix(rep(NA,dim(d)[1]), ncol = 2)

report <- function(x) rbind(r,cbind(l[x][[1]]$fitted.values,l[x][[1]]$residuals))
for(i in 1:3) report(i)
r
#>       [,1] [,2]
#>  [1,]   NA   NA
#>  [2,]   NA   NA
#>  [3,]   NA   NA
#>  [4,]   NA   NA
#>  [5,]   NA   NA
#>  [6,]   NA   NA
#>  [7,]   NA   NA
#>  [8,]   NA   NA
#>  [9,]   NA   NA
#> [10,]   NA   NA
#> [11,]   NA   NA
#> [12,]   NA   NA
#> [13,]   NA   NA
#> [14,]   NA   NA
#> [15,]   NA   NA
#> [16,]   NA   NA

Created on 2023-10-26 with reprex v2.0.2

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.