need help with Haversine function

so ineed to use Haversine function from packge "pracma" , the problome is that i dont know how to use the function on big data , lets say i have 1000 values of latitude and longatuides how do i make the function calculatin the distance between two points when every points?

For every combination of a bunch of values, you might take a look at expand.grid().

From the haversine() docs, it looks like:

The location can be input in two different formats, as latitude and longitude in a character string, e.g. for Frankfurt airport as '50 02 00N, 08 34 14E', or as a numerical two-vector in degrees (not radians).

You could then calculate haversine() for all combinations kept in a data frame, for example:

df <- tibble::tibble(x = '41 58 43N, 87 54 17W', y = '50 02 00N, 08 34 14E')

pracma::haversine(df$x, df$y)
#> [1] 6971.059

Created on 2019-03-18 by the reprex package (v0.2.1)

ok ty , but im not sure im quit understand how to calculate for 1000 datas sets?

for exmple if ihave the data "quaks" each earthquaks have its own lattitude and longatuide how do i use this function ? im suppost to creat anew data frame or something ?

No, probably a new variable. Check out dplyr::mutate()

idid try to add with mutate new colunm that will calculate but the HARVESTIN said i put worng cordinat format , can you show me example ? i mean how do i put it in mutate

Using a small subset of your data, could you please turn this into a self-contained reprex (short for reproducible example)? It will help us help you if we can be sure we're all working with/looking at the same stuff.

install.packages("reprex")

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.

There's also a nice FAQ on how to do a minimal reprex for beginners, below:

What to do if you run into clipboard problems

If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.

reprex::reprex(input = "fruits_stringdist.R", outfile = "fruits_stringdist.md")

For pointers specific to the community site, check out the reprex FAQ.

here is my reprex :

attach(quakes) 
library(pracma) 
#> Warning: package 'pracma' was built under R version 3.5.3
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
x=mutate(quakes,dist=haversine(quakes$lat,quakes$long))
#> Error in mutate_impl(.data, dots): Evaluation error: Coordinate input not in correct format..

Created on 2019-03-19 by the reprex package (v0.2.1)

1 Like

If you look at the documentation for the haversine() function (linked above, and can also be seen by running ?haversine when pracma is attached, you'll see that the first two arguments, loc1 and loc2 each take a specific format.

Haversine Formula

Haversine formula to calculate the arc distance between two points on earth (i.e., along a great circle).

Usage
haversine(loc1, loc2, R = 6371.0)
Arguments

loc1, loc2

Locations on earth; for format see Details.
R

Average earth radius R = 6371 km, can be changed on input.

Details

The Haversine formula is more robust for the calculating the distance as with the spherical cosine formula. The user may want to assume a slightly different earth radius, so this can be provided as input.

The location can be input in two different formats, as latitude and longitude in a character string, e.g. for Frankfurt airport as '50 02 00N, 08 34 14E', or as a numerical two-vector in degrees (not radians).

Here for latitude 'N' and 'S' stand for North and South, and for longitude 'E' or 'W' stand for East and West. For the degrees format, South and West must be negative.

These two formats can be mixed.

The data you have given it is not in that format. You're giving it latitude for loc1 and longitude for loc2.

The two possible formats are shown in the example for haversine():

library(pracma)
FRA = '50 02 00N, 08 34 14E'  # Frankfurt Airport
ORD = '41 58 43N, 87 54 17W'  # Chicago O'Hare Interntl. Airport
fra <- c(50+2/60, 8+34/60+14/3600)
ord <- c(41+58/60+43/3600, -(87+54/60+17/3600))

dis <- haversine(FRA, ORD)    # 6971.059 km
fprintf('Flight distance Frankfurt-Chicago is %8.3f km.\n', dis)
#> Flight distance Frankfurt-Chicago is 6971.059 km.

dis <- haversine(fra, ord)
fprintf('Flight distance Frankfurt-Chicago is %8.3f km.\n', dis)
#> Flight distance Frankfurt-Chicago is 6971.059 km.

Created on 2019-03-19 by the reprex package (v0.2.1)

I think haversine() is not vectorized, (I hadn't used the pracma package until responding to you here), so you might need to use purrr, rather than just mutate(). The Stack Overflow thread below should help, they're using a different package, but the same approach should work:

2 Likes

yeha the data is not at the format of loc rather each long and lat of there own uniqe colum , i tried to unit them in to one colum with no success

Do you have to use pracma implementation to get the distance matrix? Otherwise, the following seems OK to me. Actually, purrr is not required.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(geosphere) # contains the distm function

locations_tidyverse <-
  quakes %>%
  head() %>%
  select(long, lat) %>% # distm requires longitude first, before latitude
  mutate(long = if_else(condition = (long > 180),
                        true = (long - 360),
                        false = long)) # distm use -180:180 longitude format

locations_baseR <- within(data = head(quakes)[2:1],
                          expr = {
                            long <- ifelse(test = (long > 180),
                                           yes = (long - 360),
                                           no = long)
                          })

distm(x = locations_tidyverse,
      fun = distHaversine)
#>           [,1]      [,2]     [,3]     [,4]      [,5]     [,6]
#> [1,]      0.00  65416.35 670925.4 272765.2  35470.22 293108.9
#> [2,]  65416.35      0.00 676068.9 302329.8  99481.44 358385.5
#> [3,] 670925.45 676068.89      0.0 928617.5 658574.01 703868.5
#> [4,] 272765.16 302329.84 928617.5      0.0 274549.94 337917.6
#> [5,]  35470.22  99481.44 658574.0 274549.9      0.00 259181.2
#> [6,] 293108.93 358385.48 703868.5 337917.6 259181.18      0.0

distm(x = locations_baseR,
      fun = distHaversine)
#>           [,1]      [,2]     [,3]     [,4]      [,5]     [,6]
#> [1,]      0.00  65416.35 670925.4 272765.2  35470.22 293108.9
#> [2,]  65416.35      0.00 676068.9 302329.8  99481.44 358385.5
#> [3,] 670925.45 676068.89      0.0 928617.5 658574.01 703868.5
#> [4,] 272765.16 302329.84 928617.5      0.0 274549.94 337917.6
#> [5,]  35470.22  99481.44 658574.0 274549.9      0.00 259181.2
#> [6,] 293108.93 358385.48 703868.5 337917.6 259181.18      0.0
1 Like

i dont have to use the pcarma it was just a reccomendation that iget from the asseigment , i can use other methods if possible , i want to try to do what have you done but what does %>% mean? sorry im noob

This is called pipe, implemented in the magrittr package.

If you want to do f(g(x,y),z), then in this notation, it becomes x %>% g(y) %>% f(z).

You can find more about this here:

https://r4ds.had.co.nz/

I edited my previous post to add an equivalent approach, with which I expect you to be familiar.

ty i get it now , but i think i need to use pracma becuse ineed to add the calculation as new colum in the quakes data and i cant do it with the method use used

locations <-
  quakes %>%
  head() %>%
  select(long, lat) %>% 
  mutate(long = if_else(condition = (long > 180),
                        true = (long - 360),
                        false = long))
#> Error in quakes %>% head() %>% select(long, lat) %>% mutate(long = if_else(condition = (long > : could not find function "%>%"

distm(x = locations,
      fun = distHaversine)
#> Error in distm(x = locations, fun = distHaversine): could not find function "distm"
quakes_distance=mutate(quakes,distance=distm(x = locations,
      fun = distHaversine))
#> Error in mutate(quakes, distance = distm(x = locations, fun = distHaversine)): could not find function "mutate"

Created on 2019-03-19 by the reprex package (v0.2.1)

1 Like

You need to include the libraries you're using inside your reprex.

Along with what Mara said, you can actually add a new column to the existing dataset even with this approach:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(geosphere)
library(tidyr)

locations <-
  quakes %>%
  head() %>%
  select(long, lat) %>%
  mutate(long = if_else(condition = (long > 180),
                        true = (long - 360),
                        false = long)) %>%
  crossing(., .) %>%
  rowwise() %>%
  mutate(distances = distm(x = c(long, lat),
                           y = c(long1, lat1),
                           fun = distHaversine))

as.data.frame(x = locations)
#>       long    lat   long1   lat1 distances
#> 1  -178.38 -20.42 -178.38 -20.42      0.00
#> 2  -178.38 -20.42 -178.97 -20.62  65416.35
#> 3  -178.38 -20.42 -175.90 -26.00 670925.45
#> 4  -178.38 -20.42 -178.34 -17.97 272765.16
#> 5  -178.38 -20.42 -178.04 -20.42  35470.22
#> 6  -178.38 -20.42 -175.69 -19.68 293108.93
#> 7  -178.97 -20.62 -178.38 -20.42  65416.35
#> 8  -178.97 -20.62 -178.97 -20.62      0.00
#> 9  -178.97 -20.62 -175.90 -26.00 676068.89
#> 10 -178.97 -20.62 -178.34 -17.97 302329.84
#> 11 -178.97 -20.62 -178.04 -20.42  99481.44
#> 12 -178.97 -20.62 -175.69 -19.68 358385.48
#> 13 -175.90 -26.00 -178.38 -20.42 670925.45
#> 14 -175.90 -26.00 -178.97 -20.62 676068.89
#> 15 -175.90 -26.00 -175.90 -26.00      0.00
#> 16 -175.90 -26.00 -178.34 -17.97 928617.55
#> 17 -175.90 -26.00 -178.04 -20.42 658574.01
#> 18 -175.90 -26.00 -175.69 -19.68 703868.46
#> 19 -178.34 -17.97 -178.38 -20.42 272765.16
#> 20 -178.34 -17.97 -178.97 -20.62 302329.84
#> 21 -178.34 -17.97 -175.90 -26.00 928617.55
#> 22 -178.34 -17.97 -178.34 -17.97      0.00
#> 23 -178.34 -17.97 -178.04 -20.42 274549.94
#> 24 -178.34 -17.97 -175.69 -19.68 337917.64
#> 25 -178.04 -20.42 -178.38 -20.42  35470.22
#> 26 -178.04 -20.42 -178.97 -20.62  99481.44
#> 27 -178.04 -20.42 -175.90 -26.00 658574.01
#> 28 -178.04 -20.42 -178.34 -17.97 274549.94
#> 29 -178.04 -20.42 -178.04 -20.42      0.00
#> 30 -178.04 -20.42 -175.69 -19.68 259181.18
#> 31 -175.69 -19.68 -178.38 -20.42 293108.93
#> 32 -175.69 -19.68 -178.97 -20.62 358385.48
#> 33 -175.69 -19.68 -175.90 -26.00 703868.46
#> 34 -175.69 -19.68 -178.34 -17.97 337917.64
#> 35 -175.69 -19.68 -178.04 -20.42 259181.18
#> 36 -175.69 -19.68 -175.69 -19.68      0.00

The only problem with the above (as far as I can see yet), it contains some useless rows. Since distance is symmetric, \binom{n}{2} rows are expected, but here we have n^2 rows. Certainly, calculating distances between same locations is not required, and neither are the rows calculating distances between B and A, if we already calculated that between A and B.

But I can't provide a solution to that. I'm sure it's something very trivial, but I fail to spot it. I'm sure someone else will provide a better solution taking care of this.

1 Like

Most probably, you've already figured out how to avoid the extra rows by now. But to make this thread complete (and for my own satisfaction), I'm posting a solution based on the following post:

Here's the solution:

# loading libraries
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(geosphere) # contains the distHavershine function

all_locations <-
  quakes %>% # original dataset
  head() %>% # taking a subset for illustration (not needed, obviously!!)
  select(long, lat) %>% # taking only the coordinates
  mutate(long = if_else(condition = (long > 180),
                        true = (long - 360),
                        false = long)) # converting 0:360 system to -180:180 system
as.data.frame(x = all_locations)
#>      long    lat
#> 1 -178.38 -20.42
#> 2 -178.97 -20.62
#> 3 -175.90 -26.00
#> 4 -178.34 -17.97
#> 5 -178.04 -20.42
#> 6 -175.69 -19.68

all_location_combinations <-
  as_tibble(x = t(x = apply(X = combn(x = (1:nrow(all_locations)),
                                      m = 2), # creating row combinations
                            MARGIN = 2, # performing for each combination
                            FUN = function(u) {
                              unlist(x = all_locations[u, ]) # de-stacking rows
                            }))) %>%
  arrange(long1, lat1, long2, lat2) # changing order (not needed)
as.data.frame(x = all_location_combinations)
#>      long1   long2   lat1   lat2
#> 1  -178.97 -178.34 -20.62 -17.97
#> 2  -178.97 -178.04 -20.62 -20.42
#> 3  -178.97 -175.90 -20.62 -26.00
#> 4  -178.97 -175.69 -20.62 -19.68
#> 5  -178.38 -178.97 -20.42 -20.62
#> 6  -178.38 -178.34 -20.42 -17.97
#> 7  -178.38 -178.04 -20.42 -20.42
#> 8  -178.38 -175.90 -20.42 -26.00
#> 9  -178.38 -175.69 -20.42 -19.68
#> 10 -178.34 -178.04 -17.97 -20.42
#> 11 -178.34 -175.69 -17.97 -19.68
#> 12 -178.04 -175.69 -20.42 -19.68
#> 13 -175.90 -178.34 -26.00 -17.97
#> 14 -175.90 -178.04 -26.00 -20.42
#> 15 -175.90 -175.69 -26.00 -19.68

distances_between_combinations <-
  all_location_combinations %>%
  rowwise() %>% # performing for each row
  mutate(distances = distm(x = c(long1, lat1),
                           y = c(long2, lat2),
                           fun = distHaversine)) # finding distances
as.data.frame(x = distances_between_combinations)
#>      long1   long2   lat1   lat2 distances
#> 1  -178.97 -178.34 -20.62 -17.97 302329.84
#> 2  -178.97 -178.04 -20.62 -20.42  99481.44
#> 3  -178.97 -175.90 -20.62 -26.00 676068.89
#> 4  -178.97 -175.69 -20.62 -19.68 358385.48
#> 5  -178.38 -178.97 -20.42 -20.62  65416.35
#> 6  -178.38 -178.34 -20.42 -17.97 272765.16
#> 7  -178.38 -178.04 -20.42 -20.42  35470.22
#> 8  -178.38 -175.90 -20.42 -26.00 670925.45
#> 9  -178.38 -175.69 -20.42 -19.68 293108.93
#> 10 -178.34 -178.04 -17.97 -20.42 274549.94
#> 11 -178.34 -175.69 -17.97 -19.68 337917.64
#> 12 -178.04 -175.69 -20.42 -19.68 259181.18
#> 13 -175.90 -178.34 -26.00 -17.97 928617.55
#> 14 -175.90 -178.04 -26.00 -20.42 658574.01
#> 15 -175.90 -175.69 -26.00 -19.68 703868.46

Created on 2019-03-22 by the reprex package (v0.2.1)

1 Like

wow ty , until now icoudnt get it right and you solve it
although idont know how to add the new colun distance to orignal quakes data i dont think that possiable

Obviously, it's possible. You can just cbind the rest of the columns.

I misunderstood. This can be done if you make 10 + 1 columns in total. But that's unnecessary in my opinion.

i did try c bind but the prbolome is that my orignal data quaks contain 1000 object and ofcoruse the distance contain higer objerts