I am trying to build a model that will predict a specific diagnosis is present. E.g CT chest showing calcification in coronary arteries or MRI showing typical amyloid pattern. I am trying to build the best possible model that would predict the positive scan based on the variables I have like age/gender/co morbidities etc. Do I need ti use a specific package/test or I can start with just logistic regression? And what are things I need to consider before starting a model? Do I need to include all the variables in the start or the one I think will be most relevant?
Thank you.
Zafar
Here is a very short introduction to logistic regression using a coronary heart disease dataset with age, age cohort and CDH as variables illustrating the use of glm, which is a standard methodology against the results of which any other techniques should be compared.
There is more to evaluating a model than simply discarding those parameters with a p-value below the selected &\alpha$ in a forward or backwards stepwise selection because there can be no guarantee that a parameter that is scored insignificant in the presence of one set of parameters will also be insignificant with respect to a different set of parameters.
In addition to the Hosmer-Lemeshow-Sturdivant text cited in the link, Frank Harrell’s Regression Modeling Strategies and the associated {rms} package should be reviewed.