Predicting Clusters in R

AC3112 · March 7, 2022, 5:14pm

Hi All,

Conceptual question. Imagine I wished to conduct a three-stage classification and prediction procedure:

Stage (1): Use an unsupervised method (whether k-modes, PAM, latent class, whatever) on a subset of Likert-scale categorical variables to classify/cluster these.
Stage (2): Store the class/cluster output.
Stage (3): Use the unsupervised output as a dependent (ordinal or otherwise) variable in a supervised routine. Thus, evaluate whether baseline characteristic variables (age, sex, etc) could sufficiently predict outcome previously obtained the Stage (1)-(2).

This is what I endeavour to do. However, I am not sure if anyone has seen this type of process before? If it has a formal name? And if there are any useful links to papers/code in R?

Would appreciate the feedback

arthur.t · March 7, 2022, 6:12pm

Hm. Not sure if the new feature for cluster will serve much purpose. The information that generated the cluster ID already exists in the original variables. I don't think it will improve the quality of fit. It also has the downside of compromising the descriptive information from the model because the cluster ID is correlated with the original variables.

AC3112 · March 8, 2022, 2:16pm

Thanks @arthur.t, really appreciate your feedback.

system · March 29, 2022, 2:17pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.