The cluster library contains the ruspini data - a standard set of data for illustrating cluster analysis.
library(cluster) ## to get the ruspini data
plot(ruspini, asp=1, pch=20) ## take a look at the data
hclust expects a distance matrix, not the original data. We compute the tree using the default parameters and display it. The hang parameter lines up all of the leaves of the tree along the baseline.
ruspini_hc_defaults <- hclust(dist(ruspini))
dend <- as.dendrogram(ruspini_hc_defaults)
if(!require(dendextend)) install.packages("dendextend"); library(dendextend)
dend <- color_branches(dend, k = 4)
plot(dend)
Cut the tree to give four clusters and replot the data coloring the points by cluster. k is the desired number of clusters.
rhc_def_4 = cutree(ruspini_hc_defaults,k=4)
plot(ruspini, pch=20, asp=1, col=rhc_def_4)
This clustering is a little odd. We can get a better clustering by scaling the data first.
scaled_ruspini_hc_defaults = hclust(dist(scale(ruspini)))
srhc_def_4 = cutree(scaled_ruspini_hc_defaults,4)
plot(ruspini, pch=20, asp=1, col=srhc_def_4)
The default dissimilarity measure for comparing clusters is "complete". You can specify a different measure with the method parameter.
ruspini_hc_single = hclust(dist(ruspini), method="single")