Dear R experts,
I have a set of data (data1) in which every subject has its own location (latitude,longitude) and another data set (data2) in which every store has its location and area. I want to find the store which is most close to each of subject. My script is as bellowe:
library(geosphere)
data1=data.frame(id=c(123,456,789),
latitude=c(23.4567,24.4567,25.4567),
longitude=c(120.4567,120.3567,120.1567))
data2=data.frame(name=c(123,456,789),
area=c('a','b','c'),
latitude=c(23.123,24.456,26.789),
longitude=c(120.3367,120.4567,120.2567))
for (i in 1:nrow(data1)) {
if (!is.na(data1$latitude[i])) {
data2$d=NA
data2$d=distm(data2[,c('longitude','latitude')],
data1[i,c('longitude','latitude')],fun=distHaversine)/1000
}
data1$d[i]=data2$d[which.min(data2$d) ]
data1$store[i]=data2$store[which.min(data2$d)]
data1$area[i]=data2$area[which.min(data2$d)]}
In the end, the most near store, area and distance is attached to data1.
The problem is data1 actually has 600000 rows and data2 has 180 rows and the loop ran like forever to get the result.
Is there any faster way to achieve this?
Any advice will be appreciated.
Best,
Veda