Convert sf object to data.frame with region and order for using with geom_poloygon

I love using sf for shapefiles BUT I'm doing some work in a Census data center and need to make some maps where GDAL isn't up to date to be able to use sf. Updating GDAL isn't an option so I want to do the following:

  1. Import shapefiles using sf on my computer
  2. Create data.frames which have group, order, region, and subregion to be used with geom_polygon
  3. Save the data.frames and get it uploaded to the Census data center to use in a script like the one below. I can't use the built-in maps package definition of counties because they have it wrong in VA as shown in this old post: https://gis.stackexchange.com/questions/121896/is-there-a-county-level-maps-dataset-that-includes-independent-cities-in-virgini on SO
library(ggplot2)
library(tibble)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(magrittr)
library(tidyr)
#> 
#> Attaching package: 'tidyr'
#> The following object is masked from 'package:magrittr':
#> 
#>     extract
library(purrr)
#> 
#> Attaching package: 'purrr'
#> The following object is masked from 'package:magrittr':
#> 
#>     set_names
library(stringr)
library(maps)
#> 
#> Attaching package: 'maps'
#> The following object is masked from 'package:purrr':
#> 
#>     map

# https://eriqande.github.io/rep-res-web/lectures/making-maps-with-R.html

data(county.fips) #this is a crosswalk between state/county names and FIPS

#make a dataframe of the polygon information
map_county_df <- map_data("county") %>%
  mutate(polyname=str_c(region, subregion, sep=","))

#Add on FIPS code for later merges
#Some merging was not possible and manually looked up

map_county_df_fips <- map_county_df %>%
  left_join(county.fips, by="polyname") %>%
  mutate(char_fips=case_when(
    !is.na(fips)~str_pad(fips, 5, side="left", pad="0"),
    str_detect(polyname, "okaloosa")~"12091", #manually looked these up
    str_detect(polyname, "st martin")~"22099",
    str_detect(polyname, "currituck")~"37053",
    str_detect(polyname, "galveston")~"48167",
    str_detect(polyname, "accomack")~"51001",
    str_detect(polyname, "pierce")~ "53053",
    str_detect(polyname, "san juan")~ "53055"    
  ))

#Ensure every thing has a FIPS code
map_county_df_fips %>%
  filter(is.na(char_fips)) %>%
  select(region, subregion, polyname)
#> [1] region    subregion polyname 
#> <0 rows> (or 0-length row.names)

#Don't need axes for maps so making this to do that
ditch_the_axes <- theme(
  axis.text = element_blank(),
  axis.line = element_blank(),
  axis.ticks = element_blank(),
  panel.border = element_blank(),
  panel.grid = element_blank(),
  axis.title = element_blank()
)

#In RDC, need to merge on variables and then fill by that thing.
map_county_df_fips %>%
  filter(str_sub(char_fips, 1, 2)=="51") %>%
  ggplot(aes(x = long, y = lat, group = group, fill=subregion)) +
  geom_polygon(colour="black") +
  coord_fixed(1.3)  +
  guides(fill=FALSE) +
  ditch_the_axes

Created on 2019-11-10 by the reprex package (v0.3.0)

I'm guessing I don't understand the question properly, because I don't understand why the following doesn't work

library(ggplot2)
library(sf)
#> Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0
# shapefile from https://catalog.data.gov/dataset/tiger-line-shapefile-2016-state-virginia-current-county-subdivision-state-based
va <- sf::st_read("tl_2016_51_cousub/tl_2016_51_cousub.shp")
#> Reading layer `tl_2016_51_cousub' from data source `/Users/rc/Desktop/tl_2016_51_cousub/tl_2016_51_cousub.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 551 features and 18 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -83.67539 ymin: 36.54085 xmax: -75.16643 ymax: 39.46601
#> epsg (SRID):    4269
#> proj4string:    +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
ggplot() + geom_sf(data = va)

Created on 2019-11-10 by the reprex package (v0.3.0)

I won't be able to use sf in that environment so need an object like the one that comes from using map_data with the group and order not a sf list-column.

I was confused by "Import shapefiles using sf on my computer" into thinking you could. If you can work with the geometry variable of a data frame, would it help to gist a serialized va object?

Ok my goal is to use sf to import shapefiles on my local computer then convert them to a dataframe without the geometry which I can then get put onto the server without the sf capability. Sorry I wasn't clear that I can first use a computer with sf but then need to somehow go from geometry to the group, order, region and subregion for use with geom_polygon. fortify doesn't work as it's already a tidy format

1 Like

Ok, it's finally penetrating my thick skull, partially. It's possible to coerce sf multipolygons to Spatial and SpatialDataFrame objects and I suppose it is possible to work through the S4 object to translate into the target format, but that's when I'd considering provisioning an EC2 instance to drop the GDAL barrier issue.

Wish I had something more useful to offer.

Thanks, I'll look into coercing the objects. The environment that has the restrictions also lacks internet so I can't do anything with the cloud.

1 Like

Here's my solution thanks to @technocrat's tip about coercing the sf object. I will save the data.frame countymap_GEOID using write_rds, move over to the environment with restrictions and be able to plot the data of interest which is associated with the counties.

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
library(tidyverse)
countysf <- read_sf("tl_2019_us_county")

countysp <- as_Spatial(countysf, IDs=countysf$GEOID)

countymap <- fortify(countysp)
#> Regions defined for each Polygons

length(unique(countymap$id))
#> [1] 3233
nrow(countysf)
#> [1] 3233

# Taking a guess that id is linked with GEOID since they have the same
# number of unique levels
# checking this assumption

IDCW <-countysf %>%
  st_drop_geometry() %>%
  mutate(id=as.character(row_number()))

countymap_GEOID <- countymap %>%
  left_join(IDCW, by="id")

ditch_the_axes <- theme(
  axis.text = element_blank(),
  axis.line = element_blank(),
  axis.ticks = element_blank(),
  panel.border = element_blank(),
  panel.grid = element_blank(),
  axis.title = element_blank()
)

countymap_GEOID %>%
  filter(str_sub(GEOID, 1, 2)=="51") %>% #filtering to VA to test assumption
  ggplot(aes(x = long, y = lat, group = group, fill=GEOID)) +
  geom_polygon(colour="black") +
  coord_fixed(1.3)  +
  guides(fill=FALSE) +
  ditch_the_axes

Created on 2019-11-11 by the reprex package (v0.3.0)

2 Likes

It hurts me to think of the environment that forces upon you such an ugly workaround!
Cruel & unusual punishment indeed :smiley:

Relying on order of polygons can be risky, do consider something like

countymap <- fortify(countysp, region = "char_fips")
print(unique(countymap$id)) # check the format

so that you keep a key (FIPS code) to re-attach the relevant data in a safe manner.

1 Like

I agree it's not a fun environment. Not just the limitations of no updated GDAL but also the fact that we Google so much while coding. I have to step outside the lab when I want to Google something R-related as we can't use devices in the room! An updated solution based on @jlacko pointing out to specify the region

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
library(tidyverse)
library(gpclib)
#> General Polygon Clipper Library for R (version 1.5-5)
#>  Type 'class ? gpc.poly' for help
library(maptools)
#> Loading required package: sp
#> Checking rgeos availability: FALSE
#>      Note: when rgeos is not available, polygon geometry     computations in maptools depend on gpclib,
#>      which has a restricted licence. It is disabled by default;
#>      to enable gpclib, type gpclibPermit()
gpclibPermit()
#> Warning in gpclibPermit(): support for gpclib will be withdrawn from
#> maptools at the next major release
#> [1] TRUE

countysf <- read_sf("cb_2018_us_county_5m")

countysp <- as(countysf, "Spatial")

countymap <- ggplot2::fortify(countysp, region="GEOID")

# Taking a guess that id is linked with GEOID since they have the same
# number of unique levels
# checking this assumption

IDCW <-countysf %>%
  st_drop_geometry() %>%
  select(GEOID, NAME, STATEFP, COUNTYFP)

countymap_GEOID <- countymap %>%
  rename(GEOID=id) %>%
  left_join(IDCW, by="GEOID") %>%
  as_tibble()

ditch_the_axes <- theme(
  axis.text = element_blank(),
  axis.line = element_blank(),
  axis.ticks = element_blank(),
  panel.border = element_blank(),
  panel.grid = element_blank(),
  axis.title = element_blank()
)

countymap_GEOID %>%
  filter(STATEFP=="51") %>% 
  # In VA, county FIPS 1-199 are counties, 510-840 are independent cities
  # https://en.wikipedia.org/wiki/List_of_cities_and_counties_in_Virginia
  mutate(GeoType=if_else(as.numeric(COUNTYFP) < 200, "County", "Independent City")) %>%
  ggplot(aes(x = long, y = lat, group = group, fill=GeoType)) +
  geom_polygon(colour="black") +
  coord_fixed(1.3)  +
  guides(fill=FALSE) +
  ditch_the_axes


write_rds(countymap_GEOID,
          "../Data/CountyPolygonMaps.rds",
          )

Created on 2019-11-11 by the reprex package (v0.3.0)

1 Like