how can I use 'silhouette_avg' with 'tune_cluster'?

sorry for bad english.

Im trying to get 'k'for k-mean clutering.
I found method to get 'k' with elbow method by 'sse_ratio' like in later line.

#########################
library(workflows)
library(tune)

rec_spec <- recipe(~., data = mtcars) %>%
step_normalize(all_numeric_predictors()) %>%
step_pca(all_numeric_predictors())

kmeans_spec <- k_means(num_clusters = tune())

wflow <- workflow() %>%
add_recipe(rec_spec) %>%
add_model(kmeans_spec)

grid <- tibble(num_clusters = 1:10)

set.seed(4400)
folds <- vfold_cv(mtcars, v = 5)

res <- tune_cluster(
wflow,
resamples = folds,
grid = grid,
metrics = cluster_metric_set(sse_ratio) # want to get 'silhouette_avg'. how can i do that?
)

collect_metrics(res) → res_metrics

res_metrics %>%
filter(.metric == "sse_ratio") %>%
ggplot(aes(x = num_clusters, y = mean)) +
geom_point() +
geom_line() +
theme_minimal() +
ylab("mean WSS/TSS ratio, over 5 folds") +
xlab("Number of clusters") +
scale_x_continuous(breaks = 1:10)
#####################

I've tried to get 'k' with silhouette_avg but failed.

putting "silhouette_avg", "silhouette_avg(dists = dists)", "silhouette_ave, dists=dists"...
but failed.

how can i get table of silhouette instead of sse ratio?

thanks for your help.

You can get the average silhouette values, by setting silhouette_avg in cluster_metric_set(). You might need to update your tidyclust version to the CRAN version

library(tidymodels)
library(tidyclust)

rec_spec <- recipe(~., data = mtcars) %>%
  step_normalize(all_numeric_predictors()) %>%
  step_pca(all_numeric_predictors())

kmeans_spec <- k_means(num_clusters = tune())

wflow <- workflow() %>%
  add_recipe(rec_spec) %>%
  add_model(kmeans_spec)

grid <- tibble(num_clusters = 1:10)

set.seed(4400)
folds <- vfold_cv(mtcars, v = 5)

res <- tune_cluster(
  wflow,
  resamples = folds,
  grid = grid,
  metrics = cluster_metric_set(silhouette_avg)
)

collect_metrics(res)
#> # A tibble: 10 × 7
#>    num_clusters .metric        .estimator    mean     n std_err .config         
#>           <int> <chr>          <chr>        <dbl> <int>   <dbl> <chr>           
#>  1            1 silhouette_avg standard   NaN         0 NA      Preprocessor1_M…
#>  2            2 silhouette_avg standard     0.427     5  0.0123 Preprocessor1_M…
#>  3            3 silhouette_avg standard     0.408     5  0.0181 Preprocessor1_M…
#>  4            4 silhouette_avg standard     0.400     5  0.0338 Preprocessor1_M…
#>  5            5 silhouette_avg standard     0.447     5  0.0156 Preprocessor1_M…
#>  6            6 silhouette_avg standard     0.425     5  0.0350 Preprocessor1_M…
#>  7            7 silhouette_avg standard     0.458     5  0.0207 Preprocessor1_M…
#>  8            8 silhouette_avg standard     0.465     5  0.0219 Preprocessor1_M…
#>  9            9 silhouette_avg standard     0.396     5  0.0240 Preprocessor1_M…
#> 10           10 silhouette_avg standard     0.400     5  0.0228 Preprocessor1_M…

Created on 2023-03-20 with reprex v2.0.2

1 Like

Thanks! you were right!

problem was version of tidyclust.

After tidyclust updated, It worked!
sorry for stupidity.

Thanks for your reply and great work!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.