Extracting Long/Lat from a shapefile and calculating nearest distance between two sets of coordinates

I have two sets of data that I'm working with. The first is a list of addresses and their respective Longitude and Latitude coordinates:

First is the list of addresses and their long/lat coorindates:

library(dplyr)
library(tidygeocoder)
library(tigris)
library(rgeos)
library(sf)

    addresses <-
      structure(
        list(
          id = c(
            107234063L,
            106950145L,
            107256562L,
            107277550L,
            106952865L,
            106858955L,
            104019143L,
            102264960L,
            101690658L,
            107259458L
          ),
          streetno = c(12700L, 2016L, 311L, 3405L,
                       2400L, 711L, 2400L, 406L, 14002L, 1502L),
          streetname = c(
            "Stafford",
            "Dunlavy",
            "Branard",
            "Shepherd",
            "Fountain View",
            "William",
            "Braeswood",
            "Hawthorne",
            "Hempstead",
            "Quitman"
          ),
          city = c(
            "Stafford",
            "Houston",
            "Houston",
            "Houston",
            "Houston",
            "Houston",
            "Houston",
            "Houston",
            "Houston",
            "Houston"
          ),
          state = c("TX", "TX", "TX",
                    "TX", "TX", "TX", "TX", "TX", "TX", "TX"),
          zip5 = c(
            77477L,
            77006L,
            77006L,
            77018L,
            77057L,
            77002L,
            77030L,
            77006L,
            77040L,
            77009L
          ),
          vaddress = c(
            "12700 Stafford Rd",
            "2016 Dunlavy St",
            "311 Branard St",
            "3405 N Shepherd Dr",
            "2400 Fountain View Dr",
            "711 William St",
            "2400 N Braeswood Blvd",
            "406 Hawthorne St",
            "14002 Hempstead Rd",
            "1502 Quitman St"
          ),
          countyname = c(
            "Harris",
            "Harris",
            "Harris",
            "Harris",
            "Harris",
            "Harris",
            "Harris",
            "Harris",
            "Harris",
            "Harris"
          ),
          hdname = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
          complete_address = c(
            "12700 Stafford Rd, Stafford, TX 77477",
            "2016 Dunlavy St, Houston, TX 77006",
            "311 Branard St, Houston, TX 77006",
            "3405 N Shepherd Dr, Houston, TX 77018",
            "2400 Fountain View Dr, Houston, TX 77057",
            "711 William St, Houston, TX 77002",
            "2400 N Braeswood Blvd, Houston, TX 77030",
            "406 Hawthorne St, Houston, TX 77006",
            "14002 Hempstead Rd, Houston, TX 77040",
            "1502 Quitman St, Houston, TX 77009"
          ),
          longlat = structure(
            list(
              singlelineaddress = c(
                "12700 Stafford Rd, Stafford, TX 77477",
                "2016 Dunlavy St, Houston, TX 77006",
                "311 Branard St, Houston, TX 77006",
                "3405 N Shepherd Dr, Houston, TX 77018",
                "2400 Fountain View Dr, Houston, TX 77057",
                "711 William St, Houston, TX 77002",
                "2400 N Braeswood Blvd, Houston, TX 77030",
                "406 Hawthorne St, Houston, TX 77006",
                "14002 Hempstead Rd, Houston, TX 77040",
                "1502 Quitman St, Houston, TX 77009"
              ),
              lat = c(
                29.634542,
                29.74779,
                29.737116,
                29.817532,
                29.742098,
                29.76709,
                29.698315,
                29.742435,
                29.850826,
                29.783348
              ),
              long = c(
                -95.545334,
                -95.40214,-95.383736,
                -95.41044,
                -95.48542,
                -95.35345,
                -95.415306,-95.38407,
                -95.52886,
                -95.35297
              )
            ),
            row.names = c(NA,-10L),
            class = c("tbl_df", "tbl", "data.frame")
          )
        ),
        row.names = 353:362,
        class = "data.frame"
      )

The next set of coordinates comes from downloading the following shapefile:

texas_hd <- state_legislative_districts("TX", house = "lower")

So what I'm looking to do is twofold:

  1. Extract the long/lat coordinates from the texas_hd file

  2. Create a new column in addresses that maps each rows long/lats coordinates to the nearest SLDLST code from texas_hd

So for example, the address of 2016 Dunlavy St, Houston, TX 77006 should be 134 from SLDLST

storing of lat long in a nested tibble is... a cruel and unusual way of recording coordinates :slight_smile:

I suggest transforming your addresses data frame to {sf} data format, by running sf::st_as_sf(); in order to do that properly I needed to pull the lat and long coordinates to a proper columns / as it is necessary to name the columns.

Once the format is aligned I suggest using sf::st_nearest_feature() - it returns index of the nearest item from texas_hd object. The index can then be used for subsetting.


addresses <- addresses %>% 
  mutate(lat = .$longlat$lat,
         long = .$longlat$long) %>% 
  st_as_sf(coords = c("long", "lat"), crs = 4326)


# vector of indexes of nearest SLDLST
nearest <- addresses %>% 
  st_transform(st_crs(texas_hd)) %>% # align coordinate systems
  st_nearest_feature(texas_hd)

# create new column of nearest SLDST in addresses
addresses$nearest_sldst <- texas_hd$SLDLST[nearest]

# a check / second item
addresses$nearest_sldst[2]
# 134, as expected...

I have been playing around with geocoding in R for some time. Here is a blog entry discussing some of the issues with free geocoding engines that may be helpful. Also with R code.

Interesting, thanks. However a US-centric :). Have you tried with Nominatim? Search - Nominatim 4.0.1. No offense, just a question :slight_smile:

Regards,
Grzegorz

Not yet, but I am trying to go through all the geocoding engines I can find systematically, and put together R code for running them and parsing the results. Someday I will publish the results of that effort. Geocoding is one of those things that sounds simple, but turns out to be complex. Even trying to understand what the lat-long point actually represents is fraught. And then there are the street naming and address conventions... one could write a book!

Yep, that's true. I was working with Nominatim for a while few years ago, as we ( I mean, OSM community) were importing the addresses across Poland. That time data wasn't accessible directly, so I have to scrape WMS+WFS servers recognizing the points on WMS tiles and then asking WFS for details :slight_smile:
In term of lat-long location from OSM perspective - it can be entrance to building, it can be centroid of the building geometry, it can be an arbitrary point "on" the building, or it can be a point in nowhere, just because the property got an official address. I found that a bit of buffer added to geometry helps when comparing the location of the addresses.

Regards,
Grzegorz

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.