performing a full_join() on sf objects

Hi all!

I was wondering if there was a more straight-forward (i.e. "integrated") way to do a dplyr::full_join()-type operation on two sf objects.

The st_join() function performs a "spatial join", and the dplyr *_join() functions for sf objects expect x to be an sf object, and y to be a dataframe. What I am really looking for, is a function that merges the two sf objects (kind of like an rbind() operation) while keeping all the data attached to the geometries (i.e. introducing NAs when the data is missing).

Therefore, when joining two A and B objects, the resulting object C should have nrow(C) == nrow(A) + nrow(B), and have as many columns as necessary to contain all the original data (and one single geometry column).

This is an operation that I imagine is pretty common, and I would encounter this task often when dealing with OpenStreetMap data.

See for example this code, that uses a data.table workaround:

# get libraries around Brisbane from OSM
library(osmdata)
lib <- getbb("Brisbane, Queensland") %>% 
  opq() %>% 
  add_osm_feature(key = "amenity", value = "library") %>% 
  osmdata_sf()

# extract relevant data:
# polygons
pol <- lib$osm_polygons
# points
library(dplyr)
pnt <- lib$osm_points %>% 
  # only keep libraries described as a point, not part of polygon
  filter(!is.na(name))

# convert polygons to centroids
library(sf)
cnt <- st_centroid(pol)

# merge points and centroids, keeping all variables
all_libs <- st_as_sf(data.table::rbindlist(list(pnt, cnt), fill = TRUE))

# visualise them
library(tmap)
tmap_mode("view")
tm_shape(all_libs) +
  tm_dots()

The data.table::rbindlist() solution was found in this sf issue: https://github.com/r-spatial/sf/issues/798#issuecomment-405157853
And the general non-spatial join issue is discussed here: https://github.com/r-spatial/sf/issues/239

My example works well, but I was wondering if I was missing something and there actually was a function in sf designed for that kind of operation?

Cheers

My session info:

R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C               LC_TIME=en_AU.UTF-8       
 [4] LC_COLLATE=en_AU.UTF-8     LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.8.3       sf_0.8-0          tmap_2.3-1        osmdata_0.1.1.007

loaded via a namespace (and not attached):
 [1] tidyselect_0.2.5   purrr_0.3.2        lattice_0.20-38    htmltools_0.3.6   
 [5] viridisLite_0.3.0  yaml_2.2.0         XML_3.98-1.20      rlang_0.4.1       
 [9] e1071_1.7-2        later_0.8.0        pillar_1.4.2       glue_1.3.1        
[13] DBI_1.0.0          sp_1.3-1           RColorBrewer_1.1-2 stringr_1.4.0     
[17] rgeos_0.5-2        rvest_0.3.4        raster_3.0-7       htmlwidgets_1.3   
[21] leafsync_0.1.0     codetools_0.2-16   httpuv_1.5.2       crosstalk_1.0.0   
[25] curl_4.2           class_7.3-15       Rcpp_1.0.2         KernSmooth_2.23-16
[29] xtable_1.8-4       promises_1.0.1     classInt_0.4-1     lwgeom_0.1-7      
[33] leaflet_2.0.2      jsonlite_1.6       mime_0.7           packrat_0.5.0     
[37] digest_0.6.22      stringi_1.4.3      shiny_1.3.2        tmaptools_2.0-2   
[41] grid_3.6.1         rgdal_1.4-6        tools_3.6.1        magrittr_1.5      
[45] tibble_2.1.3       dichromat_2.0-0    crayon_1.3.4       pkgconfig_2.0.3   
[49] data.table_1.12.4  xml2_1.2.2         lubridate_1.7.4    assertthat_0.2.1  
[53] httr_1.4.1         rstudioapi_0.10    R6_2.4.0           units_0.6-4       
[57] compiler_3.6.1 
1 Like

Interesting problem. Why can't you use dplyr::full_join? Or maybe dplyr::bind_rows. As far as I understand sf objects are just dataframes with a bit of extra metadata.

Hi @woodward

Thans for asking, I should probably be a bit more thorough.

dplyr::full_join()

full_join() expects the right-hand side element to be a dataframe. I get the error:

Error: y should be a data.frame; for spatial joins, use st_join

... which makes sense, as I can't expect dplyr to know how to match rows according to geometries, as there are many ways to do that. It redirects the user to the spatial joins provided by sf, which use the st_join() function along with a predicate function to define how to match geometries. Unfortunately, there isn't a predicate function that just says "all geometries are different, do not try to match them" (but really, there shouldn't be one, as it wouldn't qualify as a "spatial join" anyway).

dplyr::bind_rows()

bind_rows() doesn't merge the two geometry columns; it keeps them separate as "geometry" and "source.geometry", with the following warning messages:

Warning messages:
1: In bind_rows_(x, .id) :
Vectorizing 'sfc_POINT' elements may not preserve their attributes
2: In bind_rows_(x, .id) :
Vectorizing 'sfc_POINT' elements may not preserve their attributes

Not sure if this is to be expected at all?

Maybe you can convert the two sf objects into dataframes, join them, and then convert them back.

As usual, SO comes to the rescue:

base::rbind

Unfortunately, R Base's rbind() will only work with two dataframes that have the same structure, which is not the case here:

base.bind <- rbind(pnt, cnt)

... gives me:

Error in rbind.data.frame(...) :
numbers of columns of arguments do not match

deactivate geometries

Thanks for pointing to that question in StackOverflow, I hadn't seen it before.
However, the end result on SO is not what I am hoping for: the final object contains two separate geometry columns (geometry.x and geometry.y). In my testing, it fails anyway, for some reason:

# joining by all matching variable names
deact <- full_join(as.data.frame(pnt),
                   as.data.frame(cnt))

I get this error:

Error: Can't join on 'geometry' x 'geometry' because of incompatible types (sfc_POINT/sfc / sfc_POINT/sfc)

... even though the geometries are supposed to be deactived? (and they are supposed to be the same type of geometry anyway...)

You could add additional columns so they do have the same structure, then rbind? This is pretty easy using setdiff(names(df1), names(df2)) and mutate_at I think.

I believe the reason you are in general not allowed to do a full join of {sf} objects is maintaining consistent geometry type.

By this I mean preventing things like full_join of points and polygons.

library(sf)
library(dplyr)

# https://stackoverflow.com/questions/49181715/how-to-make-a-data-frame-into-a-simple-features-data-frame
projcrs <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"

df1 <- data.frame(place = "London", 
                 lat = 51.5074, lon = 0.1278,
                 population = 9130000,
                 palaces = 1)
df1 <- st_as_sf(x = df1,                         
               coords = c("lon", "lat"),
               crs = projcrs)

df2 <- data.frame(place = "Wellington", 
                  lat = -41.288889, lon = 174.777222,
                  population = 212700,
                  beaches = 10) 
df2 <- st_as_sf(x = df2,                         
                coords = c("lon", "lat"),
                crs = projcrs)

# add missing variables
add1 <- setdiff(names(df2), names(df1))
df1[[add1]] <- NA

add2 <- setdiff(names(df1), names(df2))
df2[[add2]] <- NA

# rbind
rbind(df1, df2)
#> Simple feature collection with 2 features and 4 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 0.1278 ymin: 51.5074 xmax: 0.1278 ymax: 51.5074
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#>        place population palaces beaches               geometry
#> 1     London    9130000       1      NA POINT (0.1278 51.5074)
#> 2 Wellington     212700      NA      10 POINT (0.1278 51.5074)

Created on 2019-11-05 by the reprex package (v0.3.0)

1 Like

I didn't see any thing in sf. But I did notice that osmdata objects use the sf geometry format, but don't seem to be sf objects.

I wonder, though, whether it might be possible to mutate the polygon objects to point objects via centroid transformation in place?