Including categorical and continuous variables in lrm, package rms

kgkirgkiris · January 19, 2025, 4:40pm

I have a couple of questions about best practices when working with lrm models using the rms package, and I would greatly appreciate your insights:

Using as.factor vs. strata for Categorical Variables:
When handling categorical variables in logistic regression models, is the use of as.factor sufficient, or are there specific scenarios where strata would be more appropriate? Could you clarify the differences in their applications, particularly in the context of lrm models?
Choosing the Optimal Transformation for Continuous Variables:
When incorporating continuous variables into lrm models, what is the best way to determine whether to use splines (rcs), polynomials (poly), or logarithmic transformations (log)?
For example, consider the following formula, where I currently use rcs for one of the continuous variables:
model <- lrm(binary_outcome ~ cont1 * rcs(cont2, 3) + categ1 + categ2 + cont3, data = data)
Abbreviations: (cont = continuous variable, categ = categorical variable)

How might the choice between these approaches affect the model's performance and interpretability? Are there specific diagnostics or criteria you recommend to guide this decision?

I apologize if these are beginner-level questions, but I am eager to learn and improve. Thank you for your time and patience in addressing my queries. Your guidance is greatly appreciated!

system · April 19, 2025, 4:40pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.