I have a couple of questions about best practices when working with lrm models using the rms package, and I would greatly appreciate your insights:
-
Using as.factor vs. strata for Categorical Variables:
When handling categorical variables in logistic regression models, is the use of as.factor sufficient, or are there specific scenarios where strata would be more appropriate? Could you clarify the differences in their applications, particularly in the context of lrm models? -
Choosing the Optimal Transformation for Continuous Variables:
When incorporating continuous variables into lrm models, what is the best way to determine whether to use splines (rcs), polynomials (poly), or logarithmic transformations (log)?
For example, consider the following formula, where I currently use rcs for one of the continuous variables:
model <- lrm(binary_outcome ~ cont1 * rcs(cont2, 3) + categ1 + categ2 + cont3, data = data)
Abbreviations: (cont = continuous variable, categ = categorical variable)
How might the choice between these approaches affect the model's performance and interpretability? Are there specific diagnostics or criteria you recommend to guide this decision?
I apologize if these are beginner-level questions, but I am eager to learn and improve. Thank you for your time and patience in addressing my queries. Your guidance is greatly appreciated!