Understanding window sizes when using kNN




I am not particularly proficient with R but I am doing some basic machine learning with kNN in RStudio for my thesis, however, I am confused about a section of the code and hoping someone can clarify.

When establishing the window size, the code looks like this:

"win <- rep(1:736, each = 10)


My lecturer has explained this to me that 'rep()' establishes the size of the window and 'each' establishes the number of repetitions, however, when I run the code I get increased accuracy when reducing 'each' and increasing 'rep()', making me think 'each' might be the window size and 'rep()' may be the number of repetitions. Who's correct here?

Thanks for your help.


Could you please turn this into a self-contained reprex (short for minimal reproducible example)? It will help us help you if we can be sure we're all working with/looking at the same stuff.

Right now the best way to install reprex is:

# install.packages("devtools")

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.

If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.

reprex::reprex(input = "fruits_stringdist.R", outfile = "fruits_stringdist.md")

For pointers specific to the community site, check out the reprex FAQ, linked to below.


Without a reprex, it's not immediately obvious to me even what the connection is between rep and the window of a kNN model. However just in terms of what the function rep does, your lecturer is correct.

rep(1:5, each = 4)
#output: 11112222333344445555

The first argument to rep is the vector to be repeated, and "each" determines how many times each element is repeated. Alternatively using "times" you get:

rep(1:5, times = 2)
#output: 1234512345

As for why you have increased accuracy with a smaller window--are you measuring training accuracy or test/validation accuracy? The accuracy of kNN will monotonically approach 100% on the training set as you decrease k