How to use censusxy to get county name from address?

I would like to get the county name from the address. I only get one line address for the firm. And I have 40,000 addresses. I am trying censusxy package now but I cannot figure out how to return to the county name..Does anyone have experience with that? Thank you! @chris.prener

censusxy::cxy_geocode(x, street = 'street_address', city = 'city', state = 'state', zip = 'postal_code',
            return = 'locations', class = 'dataframe', output = 'simple')

will return variables for longitude and latitude but not the county name associated with the address or its Census GEOID. Alternatively

census::cxy_geocode(x, street = 'street_address', city = 'city', state = 'state', zip = 'postal_code',
            return = 'locations', class = 'sf', output = 'simple')

will return a simple features {sf} package data frame with point objects for location. Using either the ordinary data frame or the sf object, the location data can be appropriated projected to a common coordinate reference frame with

data(county_laea, package = "tidycensus")

which has the 3,200 or so county level units' boundary polygon and county GEOID codes. Those, in turn, can be matched with the corresponding county names from a separate table taken from the Census data tables. Or there are many available on the web that were published in connection with COVID analyses.

The next part requires taking the cxy_geocode$state values, the two-letter postal codes for the states and matching them with the corresponding state-level two-digit GEOIDs. (These may actually be two-character; I don't recall off the top of my head.) The county-level GEOIDs begin with the same two-digit/character code followed by three additional digits/characters.

The final part involves using functions from the {sf} package to assign the sf point objects to a corresponding sf polygon object, which represents a county in the county_laea sf data frame, using sf::st_within() or sf::st_intersects(). Then appropriate joins will bring the county name into the address data frame.

For 40K records, there are bound to be some malformed addresses that census::cxy_geocode() will fail to match due to common variation in state abbreviations (for example, "ARK" for "AR"), street name (mistaking "Street" for "Road" or abbreviations. Those are difficult to scrub. On the other hand, postal zipcodes and ZCTAs (after stripping down Zip+4 codes to five-character codes) have less opportunity for error. Using the Census 2020 ZCTA5 to 2020 County Subdivision Relationship File Layout (tab20_zcta520_cousub20_natl.txt) is also more direct than the spatial approach.

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.