matching two locations with lat and lon

here is the reprex code, what it is doing is matching two data frames based on the nearest distances now what requires is it should match only nearest locations within distance of 5000m/ 5kms only if not found it should run again and again (data frame df is fixed once lat and long generated in first run it should match with other data frame ref , data frame ref lat and long may be regenerated unless found within distance of 5 kms of data frame df) till match found within 5 kms.

library(data.table)
df<-structure(list(ID=c(1:10), lat = c(runif(5,33.75,34.15))
                   , lon = c(runif(5,72.50,73.13)))
              , col.Names = c("ID","lat", "lon"), row.names = c(NA, -10L), class = c("data.table","data.frame"))
df
#>     ID      lat      lon
#>  1:  1 33.98898 73.01502
#>  2:  2 33.87585 72.60870
#>  3:  3 34.14120 73.04509
#>  4:  4 34.03172 73.10712
#>  5:  5 34.00234 72.70707
#>  6:  6 33.98898 73.01502
#>  7:  7 33.87585 72.60870
#>  8:  8 34.14120 73.04509
#>  9:  9 34.03172 73.10712
#> 10: 10 34.00234 72.70707
ref<-structure(list(ID=letters[1:10], lat = c(runif(5,33.75,34.15)), lon = c(runif(5,72.50,73.13)))
               , col.Names = c("ID","lat", "lon"),row.names = c(NA, -10L), class = c("data.table","data.frame"))
ref
#>     ID      lat      lon
#>  1:  a 33.86513 72.72187
#>  2:  b 33.80253 72.75765
#>  3:  c 33.83221 72.53763
#>  4:  d 34.09195 72.69337
#>  5:  e 34.00116 72.52747
#>  6:  f 33.86513 72.72187
#>  7:  g 33.80253 72.75765
#>  8:  h 33.83221 72.53763
#>  9:  i 34.09195 72.69337
#> 10:  j 34.00116 72.52747
#Setting to data.table format
setDT(df)
setDT(ref)
#creating a table with cartesian join 
df1<-setkey(df[,c(k=1,.SD)],k)[ref[,c(k=1,.SD)],allow.cartesian=TRUE][,k:=NULL]
df1
#calculating the Euclidean distance and giving a rank in ascending order of distance
df1[,EuDist:=sqrt((lat-i.lat)^2+(lon-i.lon)^2)][,distRank:=rank(EuDist,ties="random"),by=.(ID)]
df1
#>      ID      lat      lon i.ID    i.lat    i.lon     EuDist distRank
#>   1:  1 33.98898 73.01502    a 33.86513 72.72187 0.31823963        3
#>   2:  2 33.87585 72.60870    a 33.86513 72.72187 0.11367573        4
#>   3:  3 34.14120 73.04509    a 33.86513 72.72187 0.42507274        4
#>   4:  4 34.03172 73.10712    a 33.86513 72.72187 0.41972712        6
#>   5:  5 34.00234 72.70707    a 33.86513 72.72187 0.13800647        3
#>   6:  6 33.98898 73.01502    a 33.86513 72.72187 0.31823963        4
#>   7:  7 33.87585 72.60870    a 33.86513 72.72187 0.11367573        3
#>   8:  8 34.14120 73.04509    a 33.86513 72.72187 0.42507274        3
#>   9:  9 34.03172 73.10712    a 33.86513 72.72187 0.41972712        5
#>  10: 10 34.00234 72.70707    a 33.86513 72.72187 0.13800647        4
#>  11:  1 33.98898 73.01502    b 33.80253 72.75765 0.31780457        1
#>  12:  2 33.87585 72.60870    b 33.80253 72.75765 0.16602320        8
#>  13:  3 34.14120 73.04509    b 33.80253 72.75765 0.44420201        5
#>  14:  4 34.03172 73.10712    b 33.80253 72.75765 0.41791544        2
#>  15:  5 34.00234 72.70707    b 33.80253 72.75765 0.20611273        8
#>  16:  6 33.98898 73.01502    b 33.80253 72.75765 0.31780457        2
#>  17:  7 33.87585 72.60870    b 33.80253 72.75765 0.16602320        7
#>  18:  8 34.14120 73.04509    b 33.80253 72.75765 0.44420201        6
#>  19:  9 34.03172 73.10712    b 33.80253 72.75765 0.41791544        1
#>  20: 10 34.00234 72.70707    b 33.80253 72.75765 0.20611273        8
#>  21:  1 33.98898 73.01502    c 33.83221 72.53763 0.50246887        9
#>  22:  2 33.87585 72.60870    c 33.83221 72.53763 0.08339838        1
#>  23:  3 34.14120 73.04509    c 33.83221 72.53763 0.59412726       10
#>  24:  4 34.03172 73.10712    c 33.83221 72.53763 0.60342318        9
#>  25:  5 34.00234 72.70707    c 33.83221 72.53763 0.24010996       10
#>  26:  6 33.98898 73.01502    c 33.83221 72.53763 0.50246887        9
#>  27:  7 33.87585 72.60870    c 33.83221 72.53763 0.08339838        1
#>  28:  8 34.14120 73.04509    c 33.83221 72.53763 0.59412726        9
#>  29:  9 34.03172 73.10712    c 33.83221 72.53763 0.60342318       10
#>  30: 10 34.00234 72.70707    c 33.83221 72.53763 0.24010996        9
#>  31:  1 33.98898 73.01502    d 34.09195 72.69337 0.33772458        6
#>  32:  2 33.87585 72.60870    d 34.09195 72.69337 0.23209699        9
#>  33:  3 34.14120 73.04509    d 34.09195 72.69337 0.35514407        1
#>  34:  4 34.03172 73.10712    d 34.09195 72.69337 0.41810448        3
#>  35:  5 34.00234 72.70707    d 34.09195 72.69337 0.09065286        1
#>  36:  6 33.98898 73.01502    d 34.09195 72.69337 0.33772458        6
#>  37:  7 33.87585 72.60870    d 34.09195 72.69337 0.23209699        9
#>  38:  8 34.14120 73.04509    d 34.09195 72.69337 0.35514407        1
#>  39:  9 34.03172 73.10712    d 34.09195 72.69337 0.41810448        3
#>  40: 10 34.00234 72.70707    d 34.09195 72.69337 0.09065286        2
#>  41:  1 33.98898 73.01502    e 34.00116 72.52747 0.48770267        7
#>  42:  2 33.87585 72.60870    e 34.00116 72.52747 0.14933288        6
#>  43:  3 34.14120 73.04509    e 34.00116 72.52747 0.53622917        7
#>  44:  4 34.03172 73.10712    e 34.00116 72.52747 0.58045538        8
#>  45:  5 34.00234 72.70707    e 34.00116 72.52747 0.17960346        5
#>  46:  6 33.98898 73.01502    e 34.00116 72.52747 0.48770267        7
#>  47:  7 33.87585 72.60870    e 34.00116 72.52747 0.14933288        5
#>  48:  8 34.14120 73.04509    e 34.00116 72.52747 0.53622917        7
#>  49:  9 34.03172 73.10712    e 34.00116 72.52747 0.58045538        8
#>  50: 10 34.00234 72.70707    e 34.00116 72.52747 0.17960346        6
#>  51:  1 33.98898 73.01502    f 33.86513 72.72187 0.31823963        4
#>  52:  2 33.87585 72.60870    f 33.86513 72.72187 0.11367573        3
#>  53:  3 34.14120 73.04509    f 33.86513 72.72187 0.42507274        3
#>  54:  4 34.03172 73.10712    f 33.86513 72.72187 0.41972712        5
#>  55:  5 34.00234 72.70707    f 33.86513 72.72187 0.13800647        4
#>  56:  6 33.98898 73.01502    f 33.86513 72.72187 0.31823963        3
#>  57:  7 33.87585 72.60870    f 33.86513 72.72187 0.11367573        4
#>  58:  8 34.14120 73.04509    f 33.86513 72.72187 0.42507274        4
#>  59:  9 34.03172 73.10712    f 33.86513 72.72187 0.41972712        6
#>  60: 10 34.00234 72.70707    f 33.86513 72.72187 0.13800647        3
#>  61:  1 33.98898 73.01502    g 33.80253 72.75765 0.31780457        2
#>  62:  2 33.87585 72.60870    g 33.80253 72.75765 0.16602320        7
#>  63:  3 34.14120 73.04509    g 33.80253 72.75765 0.44420201        6
#>  64:  4 34.03172 73.10712    g 33.80253 72.75765 0.41791544        1
#>  65:  5 34.00234 72.70707    g 33.80253 72.75765 0.20611273        7
#>  66:  6 33.98898 73.01502    g 33.80253 72.75765 0.31780457        1
#>  67:  7 33.87585 72.60870    g 33.80253 72.75765 0.16602320        8
#>  68:  8 34.14120 73.04509    g 33.80253 72.75765 0.44420201        5
#>  69:  9 34.03172 73.10712    g 33.80253 72.75765 0.41791544        2
#>  70: 10 34.00234 72.70707    g 33.80253 72.75765 0.20611273        7
#>  71:  1 33.98898 73.01502    h 33.83221 72.53763 0.50246887       10
#>  72:  2 33.87585 72.60870    h 33.83221 72.53763 0.08339838        2
#>  73:  3 34.14120 73.04509    h 33.83221 72.53763 0.59412726        9
#>  74:  4 34.03172 73.10712    h 33.83221 72.53763 0.60342318       10
#>  75:  5 34.00234 72.70707    h 33.83221 72.53763 0.24010996        9
#>  76:  6 33.98898 73.01502    h 33.83221 72.53763 0.50246887       10
#>  77:  7 33.87585 72.60870    h 33.83221 72.53763 0.08339838        2
#>  78:  8 34.14120 73.04509    h 33.83221 72.53763 0.59412726       10
#>  79:  9 34.03172 73.10712    h 33.83221 72.53763 0.60342318        9
#>  80: 10 34.00234 72.70707    h 33.83221 72.53763 0.24010996       10
#>  81:  1 33.98898 73.01502    i 34.09195 72.69337 0.33772458        5
#>  82:  2 33.87585 72.60870    i 34.09195 72.69337 0.23209699       10
#>  83:  3 34.14120 73.04509    i 34.09195 72.69337 0.35514407        2
#>  84:  4 34.03172 73.10712    i 34.09195 72.69337 0.41810448        4
#>  85:  5 34.00234 72.70707    i 34.09195 72.69337 0.09065286        2
#>  86:  6 33.98898 73.01502    i 34.09195 72.69337 0.33772458        5
#>  87:  7 33.87585 72.60870    i 34.09195 72.69337 0.23209699       10
#>  88:  8 34.14120 73.04509    i 34.09195 72.69337 0.35514407        2
#>  89:  9 34.03172 73.10712    i 34.09195 72.69337 0.41810448        4
#>  90: 10 34.00234 72.70707    i 34.09195 72.69337 0.09065286        1
#>  91:  1 33.98898 73.01502    j 34.00116 72.52747 0.48770267        8
#>  92:  2 33.87585 72.60870    j 34.00116 72.52747 0.14933288        5
#>  93:  3 34.14120 73.04509    j 34.00116 72.52747 0.53622917        8
#>  94:  4 34.03172 73.10712    j 34.00116 72.52747 0.58045538        7
#>  95:  5 34.00234 72.70707    j 34.00116 72.52747 0.17960346        6
#>  96:  6 33.98898 73.01502    j 34.00116 72.52747 0.48770267        8
#>  97:  7 33.87585 72.60870    j 34.00116 72.52747 0.14933288        6
#>  98:  8 34.14120 73.04509    j 34.00116 72.52747 0.53622917        8
#>  99:  9 34.03172 73.10712    j 34.00116 72.52747 0.58045538        7
#> 100: 10 34.00234 72.70707    j 34.00116 72.52747 0.17960346        5
#>      ID      lat      lon i.ID    i.lat    i.lon     EuDist distRank
#selecting the shortest distance
df1<-df1[distRank==1]
df1
#>     ID      lat      lon i.ID    i.lat    i.lon     EuDist distRank
#>  1:  1 33.98898 73.01502    b 33.80253 72.75765 0.31780457        1
#>  2:  9 34.03172 73.10712    b 33.80253 72.75765 0.41791544        1
#>  3:  2 33.87585 72.60870    c 33.83221 72.53763 0.08339838        1
#>  4:  7 33.87585 72.60870    c 33.83221 72.53763 0.08339838        1
#>  5:  3 34.14120 73.04509    d 34.09195 72.69337 0.35514407        1
#>  6:  5 34.00234 72.70707    d 34.09195 72.69337 0.09065286        1
#>  7:  8 34.14120 73.04509    d 34.09195 72.69337 0.35514407        1
#>  8:  4 34.03172 73.10712    g 33.80253 72.75765 0.41791544        1
#>  9:  6 33.98898 73.01502    g 33.80253 72.75765 0.31780457        1
#> 10: 10 34.00234 72.70707    i 34.09195 72.69337 0.09065286        1

Created on 2020-11-23 by the reprex package (v0.3.0)

I am sorry but what seems to be the issue here? I don't see a question to be answered in your post.

Also, on a more fundamental level, I suggest to be careful with calculating Euclidean distance using angular coordinates (latitude & longitude). The Earth is not flat, you need to project your coordinates to a planar reference system for euclidean geometry to work properly.

there is no limit on distance while matching , i want to put limitation of distance ie upto 5kms only.

you could add additional conditions ...

df1<-df1[distRank==1 & EuDist<0.1]

This will require some rework on your data processing. You are using angular difference, which is a flawed measure of distance - difference of, say, two degrees of latitude on equator will be much bigger distance in meters than a difference of two degrees of latitude somewhere close to a pole.

I suggest you approach the problem using toolset of the {sf} package. Specifically sf::st_distance() should be your friend here. It is commonly used to create a distance matrix object, which can be then filtered / subsetted as required.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.