I'm running a "lm" regression to a large dataset by groups. The problem I'm facing is that for the groups where the analysis failed for reasons such as: no variation in the data, or the levels of one of the factors is = 1, or missing values ... it totally stops the creation of my output dataframe. Can anyone help me by suggesting a solution to this problem?
My datasets consists on 4 columns :
Y = response variable
X1 : 1st explanatory variable
X2 : 2nd explanatory variable
Location : is the column to group my data, so "lm" runs by Location. Here is the code I'm using to get Predicted and Srudentized-residuals by observation:
when I run this code on some locations that I know previously are ok, all works fine, but once I run it on all locations, I got an error message. Hope this clarified my issue. Thanks again !
Hi @startz ,
Thanks a lot for your reply !
In fact, using your recomendation, I got my original file without any other output as expected from "lm".
any idea what I did wrong?
Thanks again !
d <- mtcars[,c(1:2,5:6)]
fours <- which(d$cyl == 4)
sixes <- which(d$cyl == 6)
eights <- which(d$cyl == 8)
groups <- list(fours,sixes,eights)
get_fits <- function(x) list(lm(mpg ~ drat + wt,d[unlist(groups[x]),]))
l <- list()
for(i in seq_along(groups)) l[i] = get_fits(i)
r <- matrix(rep(NA,dim(d)[1]), ncol = 2)
report <- function(x) rbind(r,cbind(l[x][[1]]$fitted.values,l[x][[1]]$residuals))
for(i in 1:3) report(i)
r
#> [,1] [,2]
#> [1,] NA NA
#> [2,] NA NA
#> [3,] NA NA
#> [4,] NA NA
#> [5,] NA NA
#> [6,] NA NA
#> [7,] NA NA
#> [8,] NA NA
#> [9,] NA NA
#> [10,] NA NA
#> [11,] NA NA
#> [12,] NA NA
#> [13,] NA NA
#> [14,] NA NA
#> [15,] NA NA
#> [16,] NA NA