Feature selection for unsupervised learning


Hi all,

processes and techniques of feature selection for supervised learning problems are widely known, however, I haven't been able to find much resources on unsupervised learning feature selection problems. I recently studies this example: Hierarchical / K means clustering, and applied the same to our internal data. I was very surprised to see how vastly different the results were depending on the features on which clustering was applied.

What is the recommended feature selection approach in unsupervised problems where the number of features is large? How do I know that the features I selected are the "right ones"? Applying expert knowledge is one thing but unsupervised learning is also largely about discovering the unknown through clustering itself.

Any help is much appreciated!


Clustering methods have never been known for their consistency.

I'm not aware of any feature selection methods here. You might try filtering for highly correlated variables or transforming those with pathological distributions.