need help with Haversine function

nirmo · March 18, 2019, 6:52pm

so ineed to use Haversine function from packge "pracma" , the problome is that i dont know how to use the function on big data , lets say i have 1000 values of latitude and longatuides how do i make the function calculatin the distance between two points when every points?

mara · March 18, 2019, 8:51pm

For every combination of a bunch of values, you might take a look at expand.grid().

From the haversine() docs, it looks like:

The location can be input in two different formats, as latitude and longitude in a character string, e.g. for Frankfurt airport as '50 02 00N, 08 34 14E', or as a numerical two-vector in degrees (not radians).

You could then calculate haversine() for all combinations kept in a data frame, for example:

df <- tibble::tibble(x = '41 58 43N, 87 54 17W', y = '50 02 00N, 08 34 14E')

pracma::haversine(df$x, df$y)
#> [1] 6971.059

^{Created on 2019-03-18 by the reprex package (v0.2.1)}

nirmo · March 19, 2019, 7:02am

ok ty , but im not sure im quit understand how to calculate for 1000 datas sets?

nirmo · March 19, 2019, 7:14am

for exmple if ihave the data "quaks" each earthquaks have its own lattitude and longatuide how do i use this function ? im suppost to creat anew data frame or something ?

mara · March 19, 2019, 10:21am

No, probably a new variable. Check out dplyr::mutate()

nirmo · March 19, 2019, 10:51am

idid try to add with mutate new colunm that will calculate but the HARVESTIN said i put worng cordinat format , can you show me example ? i mean how do i put it in mutate

mara · March 19, 2019, 11:16am

Using a small subset of your data, could you please turn this into a self-contained reprex (short for reproducible example)? It will help us help you if we can be sure we're all working with/looking at the same stuff.

install.packages("reprex")

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.

There's also a nice FAQ on how to do a minimal reprex for beginners, below:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

What to do if you run into clipboard problems

If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.

reprex::reprex(input = "fruits_stringdist.R", outfile = "fruits_stringdist.md")

For pointers specific to the community site, check out the reprex FAQ.

nirmo · March 19, 2019, 12:24pm

here is my reprex :

attach(quakes) 
library(pracma) 
#> Warning: package 'pracma' was built under R version 3.5.3
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
x=mutate(quakes,dist=haversine(quakes$lat,quakes$long))
#> Error in mutate_impl(.data, dots): Evaluation error: Coordinate input not in correct format..

^{Created on 2019-03-19 by the reprex package (v0.2.1)}

mara · March 19, 2019, 1:17pm

If you look at the documentation for the haversine() function (linked above, and can also be seen by running ?haversine when pracma is attached, you'll see that the first two arguments, loc1 and loc2 each take a specific format.

Haversine Formula

Haversine formula to calculate the arc distance between two points on earth (i.e., along a great circle).

Usage
haversine(loc1, loc2, R = 6371.0)
Arguments

loc1, loc2

Locations on earth; for format see Details.
R

Average earth radius R = 6371 km, can be changed on input.

Details

The Haversine formula is more robust for the calculating the distance as with the spherical cosine formula. The user may want to assume a slightly different earth radius, so this can be provided as input.

The location can be input in two different formats, as latitude and longitude in a character string, e.g. for Frankfurt airport as '50 02 00N, 08 34 14E', or as a numerical two-vector in degrees (not radians).

Here for latitude 'N' and 'S' stand for North and South, and for longitude 'E' or 'W' stand for East and West. For the degrees format, South and West must be negative.

These two formats can be mixed.

The data you have given it is not in that format. You're giving it latitude for loc1 and longitude for loc2.

The two possible formats are shown in the example for haversine():

library(pracma)
FRA = '50 02 00N, 08 34 14E'  # Frankfurt Airport
ORD = '41 58 43N, 87 54 17W'  # Chicago O'Hare Interntl. Airport
fra <- c(50+2/60, 8+34/60+14/3600)
ord <- c(41+58/60+43/3600, -(87+54/60+17/3600))

dis <- haversine(FRA, ORD)    # 6971.059 km
fprintf('Flight distance Frankfurt-Chicago is %8.3f km.\n', dis)
#> Flight distance Frankfurt-Chicago is 6971.059 km.

dis <- haversine(fra, ord)
fprintf('Flight distance Frankfurt-Chicago is %8.3f km.\n', dis)
#> Flight distance Frankfurt-Chicago is 6971.059 km.

^{Created on 2019-03-19 by the reprex package (v0.2.1)}

I think haversine() is not vectorized, (I hadn't used the pracma package until responding to you here), so you might need to use purrr, rather than just mutate(). The Stack Overflow thread below should help, they're using a different package, but the same approach should work:

nirmo · March 19, 2019, 2:41pm

yeha the data is not at the format of loc rather each long and lat of there own uniqe colum , i tried to unit them in to one colum with no success

Yarnabrina · March 19, 2019, 2:48pm

Do you have to use pracma implementation to get the distance matrix? Otherwise, the following seems OK to me. Actually, purrr is not required.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(geosphere) # contains the distm function

locations_tidyverse <-
  quakes %>%
  head() %>%
  select(long, lat) %>% # distm requires longitude first, before latitude
  mutate(long = if_else(condition = (long > 180),
                        true = (long - 360),
                        false = long)) # distm use -180:180 longitude format

locations_baseR <- within(data = head(quakes)[2:1],
                          expr = {
                            long <- ifelse(test = (long > 180),
                                           yes = (long - 360),
                                           no = long)
                          })

distm(x = locations_tidyverse,
      fun = distHaversine)
#>           [,1]      [,2]     [,3]     [,4]      [,5]     [,6]
#> [1,]      0.00  65416.35 670925.4 272765.2  35470.22 293108.9
#> [2,]  65416.35      0.00 676068.9 302329.8  99481.44 358385.5
#> [3,] 670925.45 676068.89      0.0 928617.5 658574.01 703868.5
#> [4,] 272765.16 302329.84 928617.5      0.0 274549.94 337917.6
#> [5,]  35470.22  99481.44 658574.0 274549.9      0.00 259181.2
#> [6,] 293108.93 358385.48 703868.5 337917.6 259181.18      0.0

distm(x = locations_baseR,
      fun = distHaversine)
#>           [,1]      [,2]     [,3]     [,4]      [,5]     [,6]
#> [1,]      0.00  65416.35 670925.4 272765.2  35470.22 293108.9
#> [2,]  65416.35      0.00 676068.9 302329.8  99481.44 358385.5
#> [3,] 670925.45 676068.89      0.0 928617.5 658574.01 703868.5
#> [4,] 272765.16 302329.84 928617.5      0.0 274549.94 337917.6
#> [5,]  35470.22  99481.44 658574.0 274549.9      0.00 259181.2
#> [6,] 293108.93 358385.48 703868.5 337917.6 259181.18      0.0

nirmo · March 19, 2019, 2:53pm

i dont have to use the pcarma it was just a reccomendation that iget from the asseigment , i can use other methods if possible , i want to try to do what have you done but what does %>% mean? sorry im noob

Yarnabrina · March 19, 2019, 3:01pm

This is called pipe, implemented in the magrittr package.

If you want to do f(g(x,y),z), then in this notation, it becomes x %>% g(y) %>% f(z).

You can find more about this here:

https://r4ds.had.co.nz/

I edited my previous post to add an equivalent approach, with which I expect you to be familiar.

nirmo · March 19, 2019, 3:18pm

ty i get it now , but i think i need to use pracma becuse ineed to add the calculation as new colum in the quakes data and i cant do it with the method use used

locations <-
  quakes %>%
  head() %>%
  select(long, lat) %>% 
  mutate(long = if_else(condition = (long > 180),
                        true = (long - 360),
                        false = long))
#> Error in quakes %>% head() %>% select(long, lat) %>% mutate(long = if_else(condition = (long > : could not find function "%>%"

distm(x = locations,
      fun = distHaversine)
#> Error in distm(x = locations, fun = distHaversine): could not find function "distm"
quakes_distance=mutate(quakes,distance=distm(x = locations,
      fun = distHaversine))
#> Error in mutate(quakes, distance = distm(x = locations, fun = distHaversine)): could not find function "mutate"

^{Created on 2019-03-19 by the reprex package (v0.2.1)}

mara · March 19, 2019, 3:23pm

You need to include the libraries you're using inside your reprex.

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

Yarnabrina · March 19, 2019, 4:15pm

Along with what Mara said, you can actually add a new column to the existing dataset even with this approach:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(geosphere)
library(tidyr)

locations <-
  quakes %>%
  head() %>%
  select(long, lat) %>%
  mutate(long = if_else(condition = (long > 180),
                        true = (long - 360),
                        false = long)) %>%
  crossing(., .) %>%
  rowwise() %>%
  mutate(distances = distm(x = c(long, lat),
                           y = c(long1, lat1),
                           fun = distHaversine))

as.data.frame(x = locations)
#>       long    lat   long1   lat1 distances
#> 1  -178.38 -20.42 -178.38 -20.42      0.00
#> 2  -178.38 -20.42 -178.97 -20.62  65416.35
#> 3  -178.38 -20.42 -175.90 -26.00 670925.45
#> 4  -178.38 -20.42 -178.34 -17.97 272765.16
#> 5  -178.38 -20.42 -178.04 -20.42  35470.22
#> 6  -178.38 -20.42 -175.69 -19.68 293108.93
#> 7  -178.97 -20.62 -178.38 -20.42  65416.35
#> 8  -178.97 -20.62 -178.97 -20.62      0.00
#> 9  -178.97 -20.62 -175.90 -26.00 676068.89
#> 10 -178.97 -20.62 -178.34 -17.97 302329.84
#> 11 -178.97 -20.62 -178.04 -20.42  99481.44
#> 12 -178.97 -20.62 -175.69 -19.68 358385.48
#> 13 -175.90 -26.00 -178.38 -20.42 670925.45
#> 14 -175.90 -26.00 -178.97 -20.62 676068.89
#> 15 -175.90 -26.00 -175.90 -26.00      0.00
#> 16 -175.90 -26.00 -178.34 -17.97 928617.55
#> 17 -175.90 -26.00 -178.04 -20.42 658574.01
#> 18 -175.90 -26.00 -175.69 -19.68 703868.46
#> 19 -178.34 -17.97 -178.38 -20.42 272765.16
#> 20 -178.34 -17.97 -178.97 -20.62 302329.84
#> 21 -178.34 -17.97 -175.90 -26.00 928617.55
#> 22 -178.34 -17.97 -178.34 -17.97      0.00
#> 23 -178.34 -17.97 -178.04 -20.42 274549.94
#> 24 -178.34 -17.97 -175.69 -19.68 337917.64
#> 25 -178.04 -20.42 -178.38 -20.42  35470.22
#> 26 -178.04 -20.42 -178.97 -20.62  99481.44
#> 27 -178.04 -20.42 -175.90 -26.00 658574.01
#> 28 -178.04 -20.42 -178.34 -17.97 274549.94
#> 29 -178.04 -20.42 -178.04 -20.42      0.00
#> 30 -178.04 -20.42 -175.69 -19.68 259181.18
#> 31 -175.69 -19.68 -178.38 -20.42 293108.93
#> 32 -175.69 -19.68 -178.97 -20.62 358385.48
#> 33 -175.69 -19.68 -175.90 -26.00 703868.46
#> 34 -175.69 -19.68 -178.34 -17.97 337917.64
#> 35 -175.69 -19.68 -178.04 -20.42 259181.18
#> 36 -175.69 -19.68 -175.69 -19.68      0.00

The only problem with the above (as far as I can see yet), it contains some useless rows. Since distance is symmetric, \binom{n}{2} rows are expected, but here we have n^2 rows. Certainly, calculating distances between same locations is not required, and neither are the rows calculating distances between B and A, if we already calculated that between A and B.

But I can't provide a solution to that. I'm sure it's something very trivial, but I fail to spot it. I'm sure someone else will provide a better solution taking care of this.

Yarnabrina · March 22, 2019, 5:39pm

Most probably, you've already figured out how to avoid the extra rows by now. But to make this thread complete (and for my own satisfaction), I'm posting a solution based on the following post:

Here's the solution:

# loading libraries
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(geosphere) # contains the distHavershine function

all_locations <-
  quakes %>% # original dataset
  head() %>% # taking a subset for illustration (not needed, obviously!!)
  select(long, lat) %>% # taking only the coordinates
  mutate(long = if_else(condition = (long > 180),
                        true = (long - 360),
                        false = long)) # converting 0:360 system to -180:180 system
as.data.frame(x = all_locations)
#>      long    lat
#> 1 -178.38 -20.42
#> 2 -178.97 -20.62
#> 3 -175.90 -26.00
#> 4 -178.34 -17.97
#> 5 -178.04 -20.42
#> 6 -175.69 -19.68

all_location_combinations <-
  as_tibble(x = t(x = apply(X = combn(x = (1:nrow(all_locations)),
                                      m = 2), # creating row combinations
                            MARGIN = 2, # performing for each combination
                            FUN = function(u) {
                              unlist(x = all_locations[u, ]) # de-stacking rows
                            }))) %>%
  arrange(long1, lat1, long2, lat2) # changing order (not needed)
as.data.frame(x = all_location_combinations)
#>      long1   long2   lat1   lat2
#> 1  -178.97 -178.34 -20.62 -17.97
#> 2  -178.97 -178.04 -20.62 -20.42
#> 3  -178.97 -175.90 -20.62 -26.00
#> 4  -178.97 -175.69 -20.62 -19.68
#> 5  -178.38 -178.97 -20.42 -20.62
#> 6  -178.38 -178.34 -20.42 -17.97
#> 7  -178.38 -178.04 -20.42 -20.42
#> 8  -178.38 -175.90 -20.42 -26.00
#> 9  -178.38 -175.69 -20.42 -19.68
#> 10 -178.34 -178.04 -17.97 -20.42
#> 11 -178.34 -175.69 -17.97 -19.68
#> 12 -178.04 -175.69 -20.42 -19.68
#> 13 -175.90 -178.34 -26.00 -17.97
#> 14 -175.90 -178.04 -26.00 -20.42
#> 15 -175.90 -175.69 -26.00 -19.68

distances_between_combinations <-
  all_location_combinations %>%
  rowwise() %>% # performing for each row
  mutate(distances = distm(x = c(long1, lat1),
                           y = c(long2, lat2),
                           fun = distHaversine)) # finding distances
as.data.frame(x = distances_between_combinations)
#>      long1   long2   lat1   lat2 distances
#> 1  -178.97 -178.34 -20.62 -17.97 302329.84
#> 2  -178.97 -178.04 -20.62 -20.42  99481.44
#> 3  -178.97 -175.90 -20.62 -26.00 676068.89
#> 4  -178.97 -175.69 -20.62 -19.68 358385.48
#> 5  -178.38 -178.97 -20.42 -20.62  65416.35
#> 6  -178.38 -178.34 -20.42 -17.97 272765.16
#> 7  -178.38 -178.04 -20.42 -20.42  35470.22
#> 8  -178.38 -175.90 -20.42 -26.00 670925.45
#> 9  -178.38 -175.69 -20.42 -19.68 293108.93
#> 10 -178.34 -178.04 -17.97 -20.42 274549.94
#> 11 -178.34 -175.69 -17.97 -19.68 337917.64
#> 12 -178.04 -175.69 -20.42 -19.68 259181.18
#> 13 -175.90 -178.34 -26.00 -17.97 928617.55
#> 14 -175.90 -178.04 -26.00 -20.42 658574.01
#> 15 -175.90 -175.69 -26.00 -19.68 703868.46

^{Created on 2019-03-22 by the reprex package (v0.2.1)}

nirmo · March 23, 2019, 8:11am

wow ty , until now icoudnt get it right and you solve it
although idont know how to add the new colun distance to orignal quakes data i dont think that possiable

Yarnabrina · March 23, 2019, 8:16am

~~Obviously, it's possible. You can just cbind the rest of the columns.~~

I misunderstood. This can be done if you make 10 + 1 columns in total. But that's unnecessary in my opinion.

nirmo · March 23, 2019, 8:23am

i did try c bind but the prbolome is that my orignal data quaks contain 1000 object and ofcoruse the distance contain higer objerts