# hierarchical clustering

Hi. I followed Statology's code for an example of hierarchical clustering fom

They conclude with a Ward's method model with optimal number of 4 clusters.

How does one plot the resulting dendrogram for this final model?

library(factoextra)
library(cluster)

df <- USArrests

#remove rows with missing values
df <- na.omit(df)

#scale each variable to have a mean of 0 and sd of 1
df <- scale(df)

m <- c( "average", "single", "complete", "ward")
names(m) <- c( "average", "single", "complete", "ward")

#function to compute agglomerative coefficient
ac <- function(x) {
agnes(df, method = x)\$ac
}

#calculate agglomerative coefficient for each clustering linkage method
sapply(m, ac)

#perform hierarchical clustering using Ward's minimum variance
clust <- agnes(df, method = "ward")

#produce dendrogram
pltree(clust, cex = 0.6, hang = -1, main = "Dendrogram")

#calculate gap statistic for each number of clusters (up to 10 clusters)
gap_stat <- clusGap(df, FUN = hcut, nstart = 25, K.max = 10, B = 50)

#produce plot of clusters vs. gap statistic
fviz_gap_stat(gap_stat)

#compute distance matrix
d <- dist(df, method = "euclidean")

#perform hierarchical clustering using Ward's method
final_clust <- hclust(d, method = "ward.D2" )

#cut the dendrogram into 4 clusters
groups <- cutree(final_clust, k=4)

# Number of members in each cluster

table(groups)

#append cluster labels to original data
final_data <- cbind(USArrests, cluster = groups)

#display first six rows of final data

#find mean values for each cluster
aggregate(final_data, by=list(cluster=final_data\$cluster), mean)

Probably not the ideal method, but a manual approach: use `table(cutree(final_clust, h = xxx))` changing `xxx` to find the number of clusters you decided on. Here, `h = 5` does give you 4 clusters. Then, you can plot it directly with:

``````plot(final_clust, cex = 0.6, hang = -1, main = "Dendrogram")
abline(h = 5, lty = "dashed", col="grey")
``````

Or the nicer-looking

``````library(ggdendro)

ggdendrogram(final_clust) +
geom_hline(aes(yintercept = 5),
linetype = "dashed", color = "grey")
``````

Or, next level, rebuilding everything (but with lots of manual adjustments needed):

``````
ddata <- dendro_data(final_clust)

ggplot() +
geom_segment(data = segment(ddata),
aes(x = x, y = y, xend = xend, yend = yend)) +
geom_text(data = ddata\$labels |>
mutate(group = as.factor(groups[ddata\$labels\$label])),
aes(x = x, y = y, label = label, color = group),
angle = 90, hjust = 1, vjust = 0.5, size = 2.5) +
scale_y_continuous(limits = c(-10,20)) +
theme_dendro() +
theme(axis.text.x = element_text(angle = angle,
hjust = 1, vjust = 0.5)) +
theme(axis.text.y = element_text(angle = angle,
hjust = 1)) +
geom_hline(aes(yintercept = 5),
linetype = "dashed", color = "grey")

``````

Thanks to both of you.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.