K-means cluster analysis in R - not sure which variables it chooses

Hi. I'm very new to Rstudio, so apologies if my questions are not very clear. I'm trying to conduct a k-means cluster analysis in Rstudio on GPS coordinates. I want to cluster the locations based on their GPS-coordinates, so the locations that are close to each other are clustered together. First, I'm trying to use the elbow method to determine the number of clusters using the below code.

> GPS <- read.delim(file.choose())
> GPS <- na.omit(GPS)
scaled_data = as.matrix(scale(GPSClean))
set.seed(123)
> k.max <- 15
> data <- scaled_data
> wss <- sapply(1:k.max, 
               function(k){kmeans(data, k, nstart=50,iter.max = 15 )$tot.withinss})
> wss
> plot(1:k.max, wss,
+      type="b", pch = 19, frame = FALSE, 
+      xlab="Number of clusters K",
+      ylab="Total within-clusters sum of squares")

I managed to get this output, however, I am not sure if this shows the right thing. Is there any way to see how Rstudio interpreted the data, and if this is actually based on the X- and Y-coordinates? If it's correct, it looks like there should be four clusters, but I'm afraid to make the wrong conclusion if R didn't interpret the data right.

we are limited in what support we can give you as we don't have your data scaled_data

side note, scaled_data comes from GSPClean in your shared script, but theres not code that makes GPSClean nor reads it. GSPClean is not the same as GPS... is this an error in your script?

what else aside from x and y coords is present in scaled_data if anything ? if the coords are the only information in scaled_data there wouldnt seem to be any room for confusion.

Perhaps you could visually plot your gps data and see it appears to the human eye to be roughly four clusters (or not)

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.