Geocoding U.S. counties for Locality, State combos without street addresses

Does anyone know a reliable, efficient, and, ideally, free or cheap way to identify the county in which a particular city or town identified by "[locality name], [state abbreviation]" is located?

So far, I've found two ways to geocode U.S. counties, but both of them have problems for my use case.

One way is to use the U.S. Census Bureau's API, which the censusxy package makes easy to do. The problem here is that the Census API requires a street address, so it will not work with town names alone.

Another way is to use the Google Maps API, which ggmap facilitates. It does accept town/state combos without a street address, and if you call geocode(..., output = 'all'), county is one of the bits of information you get back.

Sounds promising, but this approach has two problems for my use case. One is that hitting the Google Maps API costs money, at least after a subsidized trial period, and my nonprofit research program runs on a tight budget.

The second is that the output returned when you run geocode(..., output = 'all') is a byzantine mess of nested lists whose structure varies across inputs, so it's really difficult to extract the desired bits into a flat data frame structure, which is what I need for my Shiny app (see my Stack Overflow question on this aspect here).

So, I'm still looking for something in that sweet spot of reliable, efficient---our data has tens of thousands of records to geocode on a weekly basis---and affordable. Is there something I'm overlooking?

1 Like

try maybe maps::map_data("state") - gives you longitude and latitude of states and need to inner join with your main states data frame. It gave me states , I think should give counties too.

@viky, it looks like there is a county map in maps as well but I'm not sure how I could use that shapefile or whatever it is to associate counties with "[town], [state]" strings in my data frame.

Sorry, am I missing something? As you can probably tell, I'm new to the whole geocoding experience.

inner_join(counties , maincounty_df , by = 'county'?) # for states I used "region"

Yeah, I don't think that's a thing here.

image

perhaps contact the people behind : http://statsamerica.org/CityCountyFinder/Default.aspx
and ask them to make their data available to you.
perhaps you can even make an R datapackage out of it to share with others.

or you could use longitude and latitude from R and make a data frame and import it to Tableau..

I've been looking at geocoding off and on for awhile, I have a few notes on other services that might help. I'm all about free.

1 request per second
2500 requests per day
There is an R package for this one

250K queries per month
https://discover.search.hereapi.com/v1/geocode
parcel-centered locations

Another possibility might be data from the census bureau site. They may have a State code / county code/ city correspondence there. The site is an unholy mess, but it has huge amounts of data.

1 Like

Thanks, @Ajackson. I just tried opencage_forward() from the opencage package on a sample of my location strings and was impressed with the results, which include both county names and FIPS codes, and the ease of working with them. To get from location strings to a flat table of results, my call looked like this, where locations_unique is a vector of strings representing place names in "[locality], [state abbreviation]" format, with some ambiguous or dirty elements (e.g., "5S Ranch, CA").

library(dplyr)
library(purrr)
library(data.table)
library(opencage)

opencage_key <- <my api key>

map(sample(locations_unique, 5), function(x) opencage_forward(x, key = opencage_key)) %>%
  map('results') %>%
  data.table::rbindlist(., fill = TRUE)

The fill = TRUE bit in the last line is because queries can produce results tibbles with different columns.

It's also important to note that you can get more than one result, and thus more than one row in the data frame this produces, for a single query when the geocoder isn't certain which one you're after. It looks like there's a confidence column that could be used to select a single result to prefer when matching rows in this table to rows in another table (e.g., events you're trying to geolocate).

I'm not sure your exact use case but some cities span multiple counties. The most extreme example is New York City which is made of 5 counties. Where I am in Durham, NC, parts of the city of Durham are in Durham, Wake, and Orange counties. Just a warning that this won't ever be a perfect match.

1 Like

It's been a little while, but I had some fruitful experience with OpenStreetMap. As I recall it involved formatting a url query to get a JSON which could then be parsed into an R object: it was fairly straightforward. I could unearth the details if there is any interest.

Thanks, @sdutky. I took a quick look at OpenStreetMap and the R package for accessing it (openstreetmap), and I didn't see how one would use it to add county to a table with place names. If you think I'm overlooking something or have code that shows otherwise, though, please let me know.

A solution which isn't R based is to get a crosswalk of place names and counties here: http://mcdc.missouri.edu/applications/geocorr2014.html

Choose Place (city, town, etc) for source geography and county for target geography. The allocation factor will tell you what percentage of the place is within the county. You can choose to weight by population or physical area (your choice as to what makes most sense in your situation).

2 Likes

Thank you, @StatSteph, that site was really helpful. I just ran a query to get a big .csv with the place name as the input and the county name as the output for all 50 states. Now I've got that table, which I can store on GitHub or in a Google Sheet and can grab from R to match against my data frame. The population weighting for localities that span multiple counties will get a little stale over time, but that's a very minor consideration for my use case.

Hi,

https://www.geocod.io/ returns full addresses including county names (e.g. I just tried Cedar Rapids, Iowa; return includes county Linn). The service can be used in R via https://github.com/hrbrmstr/rgeocodio (although I haven't tried the latter).

Best, r

1 Like

I am a huge fan of the {tidygeocoder} package - especially when used against the OSM Nominatim backend. It is fast, cheap, reliable and comes with few strings attached (unlike the Google Terms & Conditions, which are IMHO rather onerous).

As expected there is an usage limit, though in theory you should be able to set up your own instance of the Nominatim backend (I have not yet heard of anyone actually doing so) without this constraint.

Once you have your location geocoded it should be in principle easy to assign it to a county based on an shapefile - e.g. tigris::counties() - utilizing a general point in polygon workflow (i.e. sf::st_join() & the like). Again, volume matters, and in case of a truly huge amount of points and / or polygons you might be better off by offloading the operation to a spatial database (I suggest PostGIS).

In my experience tens of thousands are OK on a mid level laptop, tens of millions might require a specialized approach.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.