Dear All,
I've recently been attempting to evaluate output from k-modes (a cluster label), relative to a so-called True cluster label (labelled 'class' below).
In other words: I've been attempting to external validate the clustering output. However, when I tried external validation measures from the 'fpc' package, I was unsuccessful (error term posted below script).
I've attached my code for the mushroom dataset. I would appreciate if anyone could show me how to successful execute these external validation measures in the context of categorical data.
Any help appreciated.
# LIBRARIES
install.packages('klaR')
install.packages('fpc')
library(klaR)
library(fpc)
#MUSHROOM DATA
mushrooms <- read.csv(file = "https://raw.githubusercontent.com/miachen410/Mushrooms/master/mushrooms.csv", header = FALSE)
names(mushrooms) <- c("edibility", "cap-shape", "cap-surface", "cap-color",
"bruises", "odor", "gill-attachment", "gill-spacing",
"gill-size", "gill-color", "stalk-shape", "stalk-root",
"stalk-surface-above-ring", "stalk-surface-below-ring",
"stalk-color-above-ring", "stalk-color-below-ring", "veil-type",
"veil-color", "ring-number", "ring-type", "spore-print-color",
"population", "habitat")
names(mushrooms)[names(mushrooms)=="edibility"] <- "class"
indexes <- apply(mushrooms, 2, function(x) any(is.na(x) | is.infinite(x)))
colnames(mushrooms)[indexes]
table(mushrooms$class)
str(mushrooms)
#REMOVING CLASS VARIABLE
mushroom.df <- subset(mushrooms, select = -c(class))
#KMODES ANALYSIS
result.kmode <- kmodes(mushroom.df, 2, iter.max = 50, weighted = FALSE)
#EXTERNAL VALIDATION ATTEMPT
class <- as.numeric(mushroom$class)
clust_stats <- cluster.stats(d = dist(mushroom.df),
class, result.kmode$cluster)
#ERROR TERM
Error in silhouette.default(clustering, dmatrix = dmat) :
NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In dist(mushroom.df) : NAs introduced by coercion