Non-normality robust coeficients test


I need advice with folowing problem:

I had run regression lm model, and after using several residual-normality tests (Anderson-Darling, Shapiro-Francis, K-S, Jarque-Bera,...) I qet quite strong straightforward result - residuals are non-normal. But I need to check signifikance of parameters in my model (like using coeftest) in non-normality robust way.

Which kind of non-parametric test should I use as alternative to regression-output t-test?

Kind Regards
Jan Žemlička

1 Like

First off: kudos for checking your modelling assumptions.

Linear regression tends to be fairly robust to departures from normality in the residuals. As long as your residuals are fairly symmetric I wouldn't worry too much. If the residuals are skewed, however, you should be concerned.

For linear regression, the normality assumption is less critical than the assumptions about homoscedasticity and independence of the errors.

Some potential next steps:

  • If you're worried about non-normal errors, look into robust regression (maybe the robust package or MASS::rlm()). It might be worthwhile to briefly check if using robust regression changes the results in a meaningful way.
  • If you're worried about heteroscedastic error consider using HC1/HC2 standard errors. The estimatr package makes this easy. You could also look into "variance stabilizing transformations", such as log-transforms, Box-Cox transformations, etc.
  • If your observations are not independent, you'll need to consider incorporating appropriate covariance structures. This depends a lot on your specific problem, but mixed models may be appropriate.

Aside: the distribution of the residuals is different from the distribution of the regression coefficients. The coefficients can come from normal (or approximately normal) sampling distribution even if the residuals are not normal.

1 Like

Thank you very much!

...and also, I recall a conversation with a professor of statistics, in which he said: "The easiest way to check if people have fabricated their data, is to check if the residuals are completely normal". In other words, don't expect your residuals to be "completely normal", when you are working on real live data :slightly_smiling_face: