# faster way to find shortest distance with distHaversine

Dear R experts,

I have a set of data (data1) in which every subject has its own location (latitude,longitude) and another data set (data2) in which every store has its location and area. I want to find the store which is most close to each of subject. My script is as bellowe:

library(geosphere)

data1=data.frame(id=c(123,456,789),
latitude=c(23.4567,24.4567,25.4567),
longitude=c(120.4567,120.3567,120.1567))

data2=data.frame(name=c(123,456,789),
area=c('a','b','c'),
latitude=c(23.123,24.456,26.789),
longitude=c(120.3367,120.4567,120.2567))

for (i in 1:nrow(data1)) {
if (!is.na(data1\$latitude[i])) {
data2\$d=NA
data2\$d=distm(data2[,c('longitude','latitude')],
data1[i,c('longitude','latitude')],fun=distHaversine)/1000
}
data1\$d[i]=data2\$d[which.min(data2\$d) ]
data1\$store[i]=data2\$store[which.min(data2\$d)]
data1\$area[i]=data2\$area[which.min(data2\$d)]}

In the end, the most near store, area and distance is attached to data1.

The problem is data1 actually has 600000 rows and data2 has 180 rows and the loop ran like forever to get the result.

Is there any faster way to achieve this?
Have a look at `st_distance` in the `sf` package. Not need to run it in a loop. You'll probably need to reproject to a geographic coordinate system, but you could transform it back to a projected coordinate system if you need the lat/long.
Perhaps `st_nearest_points` too.