Help me speed up distances from polygons to other polgons in sf

pdeffebach · March 7, 2019, 11:34pm

All,

I have a set of polygons that are our units of observation. Let's call them our sectors of interest.

I want to find the distance from each sector of interest to the nearest police station. I have polygons of police stations, but don't care about the m x n matrix of distances from every sector to every police station. My strategy is to use st_combine to make one big polygon of all the police stations and then use st_distance to find the distances.

Unfortunately it is very slow, much slower than the equivalent operation using arcpy and slower than I would expect considering it's calling optimized GDAL code.

Here is the code I am using to find the distances

match_polygons_distance_to = function(polygons, shape, idvar) {
    idvar = enquo(idvar)
    # combine all the shapes into one so we don't have a dense matrix. 
    combined_shape = st_combine(shape)
    
    # This is an array of distances with the same length as `polygons`
    # Tolerance is set to 100 to prevent too much searching
    distances = st_distance(polygons, combined_shape, tolerance = 100, by_element = TRUE)

    # Make a dataframe of these to return. Will merge in later
    polygons_df = polygons %>% st_set_geometry(NULL) %>%
        mutate(distance = distances) %>%
        select(!!idvar, distance)

    return(polygons_df)
}

I don't care too much about the actual distances, and would be fine to get distances within 50 or 100 meters of accuracy.

What I imagine a good algorithm would do is just, for each polygon, draw a donut of 50m around it and check if it contains a police station, then do a 100m donut etc. But I'm not sure what methods to call to get this procedure.

Any help is appreciated.

jlacko · March 8, 2019, 6:47am

The example you describe is not exactly reproducible, so allow me to reply in general comments:

distance calculations are greatly speeded up if you replace polygons with their centroids; this can certainly be done with the police stations, not sure about your sectors of interest.
If they are regular in shape and / or small relative to distance to nearest police station then you can use just center and you are great with point to point distance; if not then you are stuck with point to polygon, still much better than polygon to polygon.
The function is sf::st_centroid() and it is vectorized - meaning you can apply it to data frame of polygons and get a data frame of centroids, no looping required.
If you don't really care about actual distances, but rather about the count of police stations within a certain distance of your sector of interest it is easier to do sf::st_buffer() around your sector of interest (works both for points and polygons) and then run sf::st_intersection() with a dataframe of the points of police stations. It will return the stations inside the buffer zone.
As this is a two step process - first draw a buffer around the point / polygon of section of interest and then do the intersection and somehow summarize the results - I usually wrap it in a for cycle.

library(tidyverse)
library(tictoc)
library(sf)

# NC dataset - ships with sf package
nc <- st_transform(crs = "+init=epsg:5070",
                   st_read(system.file("shape/nc.shp", package = "sf"), 
                           quiet = TRUE)) %>%
  select(NAME)

sector <- nc %>% # pick a county, any county..
  filter(NAME == "Mecklenburg")

nc_centroids <- st_centroid(nc)

tic("polygon to polygon")
dist <- st_distance(sector, nc)
toc()

tic("point to polygon")
dist <- st_distance(sector, nc_centroids)
toc()

tic("point to point")
dist <- st_centroid(sector) %>%
  st_distance(nc_centroids)
toc()

tic("count counties within buffer")
buffer_zone <- st_buffer(sector, 50000) # 50 kilometers around Meckleburg county

within_zone <- st_intersection(buffer_zone, nc_centroids) %>% # do the intersection
  nrow() # just count rows (unless you care about names as well)
toc()

pdeffebach · March 8, 2019, 4:18pm

Thank you for the response! And I apologize for not creating a more reproducible example. I should have thought to use the North Carolina dataset. Your feedback was very helpful.

Treating sectors as point: Yes this should be done. The sectors are small enough that I can model them as points.
Treating police stations as points: Yes I can do this too. Most of the shapes I am matching to are small, for example schools and government administrative buildings. There may be some target shapes I can't do this with, such as areas marked as downtowns. But for big enough polygons, with a high likelihood of overlap, the area of the intersection might be more useful than distance to the target polygon. Thankfully area of intersection is very fast so it is relatively costless to add this information as well.
st_buffer is a good idea, but ultimately we want to kick that can down the road, and may change our minds about which cutoffs make the most sense in our context.

Thank you!

pdeffebach · March 8, 2019, 4:40pm

Actually, I have one question for you about this.

I might also need to match polygons (now points) to lines. Do you think it makes sense to turn lines into points separated by 100m? Or are lines a different beast entirely from polygons so that it wouldn't make sense.

If I were to turn lines to points, do you have any recommendations for how it would be done? st_cast doesn't seem right because I am not interested in the nodes that compose each line, but rather 100m equally spaced points along lines.

jlacko · March 8, 2019, 5:09pm

Glad to be of service!

I am unsure what you mean by matching lines to points (I am a person of poor imagination )

st_distance() will work for point to line distances if need be; st_buffer() will turn your line into a fat earthworm like polygon, which can be intersected with your points.

Is this what you had in mind, or am I missing the point?

pdeffebach · March 8, 2019, 5:25pm

I have a map of transit lines. I want to find the distance from each sector to the nearest public transit line.

You described above that getting the centroid of polygons will significantly speed up my calculation (it did!). Using the same logic, it seems that finding distances to lines would also be expensive compared to finding distances to points.

So I was thinking, whats an equivalent operation to centroid that I could do for lines? Since line's don't have an obvious centroid, I could replace the lines with equally-spaced points along the lines. Say, a point every 100m. Then I could find the distance from each sector to these points, and that is a good approximation for the distance to the nearest transit line line.

jlacko · March 11, 2019, 11:15am

The obvious choice of points instead of transit lines would be transit stations; I assume these are unavailable as this is too obvious for you to miss.

I would be in general vary of interpreting distance from a transit line (as opposed to a station) as meaningful. But if you must you can use st_buffer() on the lines and work with them as regular polygons.

pdeffebach · March 12, 2019, 8:38pm

Yes, I am using bus routes which don't have well defined stations.

st_buffer won't work if there is one long straight line, because then it just has one centroid.

A st_fill_points would be useful to have, for polygons or lines to fill points in at some interval.

system · March 19, 2019, 8:38pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.