How can I define the optimal value of k in the KNN model?

This is my script:

library(class)
library(ggplot2)
library(gmodels)
library(scales)
library(caret)
library(tidyverse)
library(caret)

db_data <- iris
row_train <- sample(nrow(iris), nrow(iris)*0.8)
db_train <- iris[row_train,]
db_test <- iris[-row_train,]

unique(db_train$Species)
table(db_train$Species)
#--------

#KNN
#-------
model_knn<-train(Species ~ ., data = db_train, method = "knn",tuneGrid = data.frame(k = 12))
summary(model_knn)
#-------

#PREDICTION NEW RECORD
#-------
test_data <- db_test
db_test$predict <- predict(model_knn, newdata=test_data, interval='confidence')
confusionMatrix(data=factor(db_test$predict),reference=factor(db_test$Species))
#-------

How can I define the optimal value of k in the KNN model?

refrain from dictating that k=12, and then multiple k's will be tested and the highest accuracy chosen. or set for k to be some reasonable range i.e. k=2:20

Ok, but is there a function that can automatically calculate the best value? Or do I have to test each value myself?

the function train() would do that, but you asked it only to consider the case of 12, so that's all it did.

I try this:
model_knn<-train(Species ~ ., data = db_train, method = "knn",tuneGrid = data.frame(k = c(2:20)))
but I have this error:

Error in train(Species ~ ., data = db_train, method = "knn", tuneGrid = data.frame(k = c(2:20))) :
unused arguments (data = db_train, method = "knn", tuneGrid = data.frame(k = c(2:20)))

Unfortunately I can't reproduce your error as that syntax works for me without issue.
Sidenote, while the c() wrapper arround 2:20 is not an issue, neither is it required, this is because 2:20 is already a well-formed vector.
Maybe restart your session, and try again ?

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.