Predicting Clusters in R

Hi All,

Conceptual question. Imagine I wished to conduct a three-stage classification and prediction procedure:

  • Stage (1): Use an unsupervised method (whether k-modes, PAM, latent class, whatever) on a subset of Likert-scale categorical variables to classify/cluster these.

  • Stage (2): Store the class/cluster output.

  • Stage (3): Use the unsupervised output as a dependent (ordinal or otherwise) variable in a supervised routine. Thus, evaluate whether baseline characteristic variables (age, sex, etc) could sufficiently predict outcome previously obtained the Stage (1)-(2).

This is what I endeavour to do. However, I am not sure if anyone has seen this type of process before? If it has a formal name? And if there are any useful links to papers/code in R?

Would appreciate the feedback :slight_smile:

Hm. Not sure if the new feature for cluster will serve much purpose. The information that generated the cluster ID already exists in the original variables. I don't think it will improve the quality of fit. It also has the downside of compromising the descriptive information from the model because the cluster ID is correlated with the original variables.

Thanks @arthur.t, really appreciate your feedback.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.