Seeking Package Beta Testers - Street Address Parsing

I work with American street addresses on a regular basis, and have slowly been building up a workflow for parsing them. If you're worked on parsing or standardizing address data, you know that addresses are just standardized enough that they can be parsed, but just messy enough that it is never easy. This is particularly true for "edge case" addresses that are not common.

The "grammar of street addresses" workflow that I've worked on in response to this challenge has matured to the point where the package, postmastr, is ready for beta testing. If you work with American street addresses regularly and have the time to take the package for a spin, I'd love feedback before I submit to CRAN. I want to make sure the workflow works, and can handle whatever addresses get thrown at it. If you have feedback, please submit a bug report so I can help address it.

Also, postmastr is only set-up for American street addresses right now but the functions have been built for expansion. If you work with international street addresses and want to contribute, please open a feature request issue and introduce yourself!

To give folks a sense of how the package works, here is a reprex using the package's data that takes an example set of sushi restaurants in St. Louis, Missouri. The original addresses in address are parsed, standardized, and then cleaned data are returned:

> library(postmastr)
> mo <- pm_dictionary(type = "state", filter = "MO", case = c("title", "upper"), locale = "us")
> cities <- pm_append(type = "city",
+                       input = c("Brentwood", "Clayton", "CLAYTON", "Maplewood", 
+                                 "St. Louis", "SAINT LOUIS", "Webster Groves"),
+                       output = c(NA, NA, "Clayton", NA, NA, "St. Louis", NA))
> sushi1 %>%
+   dplyr::filter(name != "Drunken Fish - Ballpark Village") %>%
+   pm_parse(input = "full", address = "address", output = "short", keep_parsed = "limited", 
+          city_dict = cities, state_dict = mo)
# A tibble: 27 x 8
   name                            address                                        visit    pm.address         pm.state pm.zip4
   <chr>                           <chr>                                          <chr>    <chr>                     <chr>     <chr>    <chr>  <chr>  
 1 BaiKu Sushi Lounge              3407 Olive St, St. Louis, Missouri 63103       3/20/18  3407 Olive St             St. Louis MO       63103  NA     
 2 Blue Ocean Restaurant           6335 Delmar Blvd, St. Louis, MO 63112          10/26/18 6335 Delmar Blvd          St. Louis MO       63112  NA     
 3 Cafe Mochi                      3221 S Grand Boulevard, St. Louis, MO 63118    10/10/18 3221 S Grand Blvd         St. Louis MO       63118  NA     
 4 Drunken Fish - Central West End 1 Maryland Plaza, St. Louis, MO 63108          12/2/18  1 Maryland Plz            St. Louis MO       63108  NA     
 5 I Love Mr Sushi                 9443 Olive Blvd, St. Louis, Missouri 63132     1/1/18   9443 Olive Blvd           St. Louis MO       63132  NA     
 6 Kampai Sushi Bar                4949 W Pine Blvd, St. Louis, MO 63108          2/13/18  4949 W Pine Blvd          St. Louis MO       63108  NA     
 7 Midtown Sushi & Ramen           3674 Forest Park Ave, St. Louis, MO 63108      3/4/18   3674 Forest Park Ave      St. Louis MO       63108  NA     
 8 Mizu Sushi Bar                  1013 Washington Avenue, St. Louis, MO 63101    9/12/18  1013 Washington Ave       St. Louis MO       63101  NA     
 9 Robata Maplewood                7260 Manchester Road, Maplewood, MO 63143      11/1/18  7260 Manchester Rd        Maplewood MO       63143  NA     
10 SanSai Japanese Grill Maplewood 1803 Maplewood Commons Dr, St. Louis, MO 63143 2/14/18  1803 Maplewood Commons Dr St. Louis MO       63143  NA     
# … with 17 more rows

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.