FreeClust easy hierarchical clustering - 2020 Shiny Contest Submission


Authors: Maciej Dobrzynski, Marc-Antoine Jacques
Working with Shiny more than 1 year

Abstract: An app for interactive hierarchical clustering and cluster validation.

Full Description: Clustering is a commonly-used unsupervised machine learning approach to partition a dataset into a set of groups called clusters. FreeClust is an open source web-app for easy interactive clustering and cluster validation. Choose from several algorithms, play with parameters, plot results in a fully interactive fashion, and download publish-ready plots.

The web-app integrates several clustering algorithms:

  • a widely-used hierarchical clustering (based on R's hclust) with a choice of commonly used linkage methods to construct the tree diagram (dendrogram),

  • sparse hierarchical clustering (using R's package sparcl), tailored to cluster high-dimensional data (with many more variables than samples). The sparse hierarchical clustering provides information about the importance of features/measurements across the samples.

Clustering will partition data even if it does not contain any clusters! Therefore, it is important to assess clustering tendency before the analysis, and validate the quality of the result after clustering. A separate module addresses this very issue. It contains cluster validation methods that allow to estimate the optimal number of clusters or assess the quality of existing clustering by inspecting the plot of principal component analysis, the silhouette plot, and he dendrogram.

FreeClust accepts CSV files in a wide format and allows for rudimentary data manipulation, such as rescaling, removal of missing data, trimming/clipping outliers. Several datasets are available within the app for testing.

Category: Research
Keywords: data analysis, statistical analysis, clustering, data exploration
Shiny app:
RStudio Cloud:


Full image: