how to make cluster on map of nearest 1 or 2 locations depending upon radius/distance

how i can make cluster of one instorecustomer with 1 or 2 onlinecluster location depending upon distance i have plot both on map have to make cluster please help

library(sf)
#set.seed(100)
store=data.frame(lat=runif(1,33.45,33.75),long=runif(1,72.83,73.17))
onlineloc=data.frame(lat=runif(10,33.45,33.75),long=runif(10,72.83,73.17))
instore=data.frame(lat=runif(5,33.45,33.75),long=runif(5,72.83,73.17))
instorecustomer=round(instore,digits = 2)
plot(onlineloc,main="Online and instore customer loc",col=" red")
points(instore,pch=2)
points(store,pch=15)
legend(x=30,y=68,legend=c("Instore Customer","Online Customer","Store"),pch=c(2,1,15),col=c("black","red","black"))
#plot(onlineloc,main="Online customer loc",col=" red")
results=kmeans(onlineloc,5)
results
onlinecluster=results$centers[,]
onlinecluster=as.data.frame(onlinecluster)
onlinecluster=round(onlinecluster,digits = 2)

distance2=dist(onlinecluster)
round(distance2,digits = 2)
plot(hclust(distance2))

library(leaflet)

leaflet() %>% # for both locs on same map
addTiles() %>%
addPolylines(lng=onlinecluster$long,lat= onlinecluster$lat,color = "red",group = "1") %>%
#addMarkers(lat=onlineloc$lat,lng=onlineloc$long,popup = "Online Loc",group = "1") %>%
addCircleMarkers(lat=onlinecluster$lat,lng=onlinecluster$long,popup = "Online Loc",group = "1",radius=10,color = "green",clusterOptions = markerClusterOptions(color="red"))%>%
addPolylines(lng=instorecustomer$long,lat= instorecustomer$lat,color = "blue",group = "2") %>%
addMarkers(lng=instorecustomer$long,lat= instorecustomer$lat,popup = "instore customer",group = "2")%>%
addMarkers(lng=store$long,lat= store$lat,popup = "store ",group = "2")

its not creating reprex code when i copy complete code but create reprex of leaflet sepreate and above code seprately

in the context of location pair coordinations, clusters are groups that locations can be members of.
Can you try and give a more conversational description of what you are trying to achieve.
Please discuss what your inputs are, and what you want to calculate, and what you want to show ?
It might be easier for you to describe it from the point of view of a user of your application.

i want to show on map two randomly generated set of locations one is onlinecluster and second instorecustomer both with 5 lat and longs which i did and secondly i want that all locations of onlineclusters (all 5 locations) become cluster with nearest instorecustomer location at maximum with one location or more depend upon distance in this way all onlinecluster locations will find its cluster with nearest instorecustomer locations

It's sounds like you you have two sets of 5 locations each.and you want to find the 5 pairs formed by taking each location from the first set and pairing with the location of least distance from the other set.

But this would not require the use of a clustering algorithm...

yes how i can do it? using code i mention above

library(tidyverse)
library(ggrepel) # to ggplot2 with labels that don't overlap (they repel)
set.seed(100)

#to make random data for the in set and the oc set
# starting with 100 random values between 33.45 and 33.75 for lat
# and           100 random values between 72,83 and 73.17 for long
lat<-runif(100,33.45,33.75) %>% round(digits = 2)
long<-runif(100,72.83,73.17) %>% round(digits = 2)
#pair them up
rand_locations_100 <- tibble(lat=lat,long=long) %>% distinct
# sample 10 of these (without replacement, 5 for in set and 5 for oc)
rand_location_10 <-sample_n(rand_locations_100,size=10,replace=FALSE)

ins_5=slice(rand_location_10,1:5) %>% mutate(group="ins_5")
oc_5=slice(rand_location_10,6:10) %>% mutate(group="oc_5")

# starting to work

full_sets <- union_all(ins_5,oc_5)
full_sets

# view these two sets
ggplot(full_sets, aes(x = long, y = lat)) +
  geom_point(aes(color = factor(group)))

# for each ins_5 find the nearest oc_5 using euclidian distance which
# is probably good enough for short distances as it ignores earth curvature.
eucliddist <- function(x1,y1,x2,y2){
  sqrt((x2-x1)**2+(y2-y1)**2)
}

combinations_of_oc5_against_ins_5 <- left_join(
  mutate(ins_5,dummy=TRUE) %>% rename(lat1=lat,long1=long),
  mutate(oc_5,dummy=TRUE) %>% rename(lat2=lat,long2=long),
  by="dummy")

with_distances <- mutate(combinations_of_oc5_against_ins_5,
                         dist=eucliddist(lat1,long1,lat2,long2))

ins_5_with_nearest_oc_5 <- group_by(with_distances,
                                    lat1,long1) %>% 
  top_n(1,wt=1/dist) %>%# smaller the better
  ungroup %>% mutate(paired_no=factor(row_number()))


to_display <- union_all(
  select(ins_5_with_nearest_oc_5,
         group.x,
         lat1,
         long1,
         paired_no) %>% rename(group=group.x,
                               lat = lat1,
                               long=long1),
  select(ins_5_with_nearest_oc_5,
         group.y,
         lat2,
         long2,
         paired_no) %>% rename(group=group.y,
                               lat = lat2,
                               long=long2)
)


# myjitter <- position_jitter(width = 0.005, height = 0.005,seed=100)

ggplot(to_display) +
  geom_point(aes(
    x = long, y = lat,
    colour = factor(group), shape = paired_no
  ) #, position = myjitter
  ) +
  geom_text_repel(aes(x = long, y = lat, label = paste0(paired_no, ":", group))
  )

1 Like

As you seem to be using {sf} package based workflow I suggest that you use sf::st_nearest_feature() for calculating the online / instore pairs. It works rather well for points data and it is computationally more efficient than calculating the carthesian product of all possible distances.

For an example of the function in practice consider this example of 5 random "ingroup" & 5 "outgroup" persons in San Francisco (with a nod to the recent rstudio::conference(2020) there:)

library(sf)
library(dplyr)
library(leaflet)
library(charlatan)

# somewhere in San Francisco...
ingroup <- data.frame(matrix(unlist(ch_position(n = 5, bbox = c(-122.5, 37.7, -122.4, 37.8))), ncol = 2, byrow = TRUE)) %>% 
  mutate(name = ch_name(5)) %>% 
  st_as_sf(coords = c("X1","X2"), crs=4326) 

# somewhere else in San Francisco...
outgroup <- data.frame(matrix(unlist(ch_position(n = 5, bbox = c(-122.5, 37.7, -122.4, 37.8))), ncol = 2, byrow = TRUE)) %>% 
  mutate(name = ch_name(5)) %>% 
  st_as_sf(coords = c("X1","X2"), crs=4326) 


# find the nearest neighbour: sf::st_nearest_feature (computationally very efficient)
ingroup <- ingroup %>% 
  mutate(nearest_out = outgroup$name[st_nearest_feature(., outgroup)]) %>% 
  mutate(label = paste0("ingroup: <b>", name, "</b><br>nearest outgroup is: ", nearest_out))
  
# And now visualize the stuff!
leaflet() %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addCircleMarkers(data = ingroup, 
                   fillColor = "red", 
                   stroke = F, 
                   fillOpacity = .8,
                   popup = ~label) %>% 
  addCircleMarkers(data = outgroup, 
                   fillColor = "blue", 
                   stroke = F, 
                   fillOpacity = .8,
                   popup = ~name)

2 Likes

its giving error while running your code

Error in ch_position(n = 5, bbox = c(-122.5, 37.7, -122.4, 37.8)) :
could not find function "ch_position"

You likely need to install the package {charlatan}; my code uses it to generate fake data

1 Like

After changing lat and long of my location it stills shows your locations on map please rectify my fault
ingroup <- data.frame(lat=df1$i.lat, lon=df1$i.lon) %>%
mutate(name = ch_name(5)) %>%
st_as_sf(coords = c("lat","lon"), crs=4326)
ingroup

pakistan...

outgroup <- data.frame(lat=df1$lat,lon=df1$lon) %>%
mutate(name = ch_name(5)) %>%
st_as_sf(coords = c("lat","lon"), crs=4326)
outgroup

find the nearest neighbour: sf::st_nearest_feature (computationally very efficient)

ingroup <- ingroup %>%
mutate(nearest_out = outgroup$name[st_nearest_feature(., outgroup)]) %>%
mutate(label = paste0("ingroup: ", name, "
nearest outgroup is: ", nearest_out))

And now visualize the stuff!

leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
addCircleMarkers(data = ingroup,
fillColor = "red",
stroke = F,
fillOpacity = .8,
popup = ~label) %>%
addCircleMarkers(data = outgroup,
fillColor = "blue",
stroke = F,
fillOpacity = .8,
popup = ~name)

I am afraid I can not help much with analyzing data I don't have access to.

You should be able to get rid of any old data by restarting R session / pressing Ctrl + Shift + F10.

You seem to be using the same data (called df1) for both your ingroup and outgroup; in such case the expected result is that it will return the same point (as it has zero distance from itself).

no i am using df1$lat , df1$long for outgroup and df1$i.lat , df1$i.long for ingroup lat and longs which are different but those are same too it should be on located on those lat long on map but still it shows san francisco

can you guide me the purpose of ch_name(5) in both ingroup and outgroup in under mention code?

ingroup <- data.frame(lat=df1$i.lat, lon=df1$i.lon) %>%
mutate(name = ch_name(5)) %>%
st_as_sf(coords = c("lat","lon"), crs=4326)

outgroup <- data.frame(lat=df1$lat,lon=df1$lon) %>%
mutate(name = ch_name(5)) %>%
st_as_sf(coords = c("lat","lon"), crs=4326)

if you've succesfully run the script, then you'd have had to install the charlatan package of which ch_name is a function.
in r if you type in the console ?ch_name you will get the help file on the function.


if you run ch_name(5) in your console, you will see the result is 5 random people names

1 Like

yes you are right but location of these people should be plotted on map as per given lat and long but its plotting in USA, i am giving lat and long of pakistan

Please provide a reprex for further help

2 posts were split to a new topic: Help Debugging a Series of Conditions Dealing with DateTime

how i can limit the matching to 50 kms radius?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.