I am currently building a predictive model and would like to test its calibration (i.e. model fit). I came across two approaches: Pearson residual vs HL test. Given that my sample size is 50 (~40 diseases vs 20 healthy), which approach should I use? I tried two approaches but they yielded completely different outcomes (the one by Pearson gives me "good fit" but the HL test suggests otherwise). I have two predictors, both continuous variables (Age & BMI).

Assuming that the response variable is binary and so the model is logistic, the sample size/size of groups is at the very lower limit at which HL should be considered reliable. Harrell, F. E., Jr. (2016). Regression modeling strategies. Springer International Publishing at 247.

The {logisticDx::dx} package is a diagnostic measure. That package has a separate gof function. I'd expect that the results of applying the two function to a model would differ similarly to what was found in your case.

Hello, thank you for the great reference. So, to recap, the minimal numbers should be N=20 for either group (i.e. N=40 for the total sample size?) in order for the HL test to be valid for the logistic regression model?

Also, do you have any advice regarding the Pearson Residual test (whether I can use it for my logistic regression model)?

Thank you!! How about the Pearson Residual test? Do you know any rules regarding the Pearson Residual test (whether I can use it for my logistic regression model)?