Tidy Clustering (Unsupervised)

I'm wondering if there are any plans in the works - or even just general brainstorming that has happened - to add unsupervised methods (i.e. clustering and community detection) to the tidymodels framework.

I ask because I'm working on a package implementing one specific community detection method, to go with a forthcoming paper. I'm building it to be as tidymodels-friendly as possible. (Thanks for {hardhat}, by the way!)

But this has also lead me down a rabbit hole of thinking about unified frameworks for clustering. The key decisions for the user to make differ slightly for unsupervised methods as compared to prediction methods. The post-facto validation and "model selection" differ enormously.

The more I plan out my personal package, the I dream of a meta-package exactly like {tidymodels} but tailored for clustering.

Before I hack my way through this specific method I'm working on, or do something silly like launch myself into taking a pass at writing a {tidyclust} implementation - I thought I'd check in with you all and see where things currently stand. :slight_smile:

Thank you!

1 Like

I'm open to pretty much anything on this. Others have asked.

The easiest thing to do right now would be to have recipe steps that performance the clustering and output cluster membership columns. We could also have api's to extract the underlying R object that defines the clustering so that it can be used in plots etc. That feels fine but still more towards supervised usage.

Otherwise, I'd be open to a stand-alone (tidy) clustering api that unites all of the disparate methods and has simple methods for plotting and analysis. In other words, a separate unsupervised api.

If you are interested, let us know. We could do a quick brainstorming session to figure out scope and a prototype api (i.e., mock up what we'd like the syntax to look like, possible S3 methods etc.).

Awesome, thanks for the quick reply!

The unsupervised method -> supervised usage did occur to me as a common path, which is part of why it'd be cool to have these methods and validations all play nice with each other.

I would definitely love to brainstorm with any of the team who is up for it. This is something I might be able to devote my summer to this year. I'll private message you.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.