I work with American street addresses on a regular basis, and have slowly been building up a workflow for parsing them. If you're worked on parsing or standardizing address data, you know that addresses are just standardized enough that they can be parsed, but just messy enough that it is never easy. This is particularly true for "edge case" addresses that are not common.
The "grammar of street addresses" workflow that I've worked on in response to this challenge has matured to the point where the package, postmastr
, is ready for beta testing. If you work with American street addresses regularly and have the time to take the package for a spin, I'd love feedback before I submit to CRAN. I want to make sure the workflow works, and can handle whatever addresses get thrown at it. If you have feedback, please submit a bug report so I can help address it.
Also, postmastr
is only set-up for American street addresses right now but the functions have been built for expansion. If you work with international street addresses and want to contribute, please open a feature request issue and introduce yourself!
To give folks a sense of how the package works, here is a reprex
using the package's data that takes an example set of sushi restaurants in St. Louis, Missouri. The original addresses in address
are parsed, standardized, and then cleaned data are returned:
> library(postmastr)
>
> mo <- pm_dictionary(type = "state", filter = "MO", case = c("title", "upper"), locale = "us")
> cities <- pm_append(type = "city",
+ input = c("Brentwood", "Clayton", "CLAYTON", "Maplewood",
+ "St. Louis", "SAINT LOUIS", "Webster Groves"),
+ output = c(NA, NA, "Clayton", NA, NA, "St. Louis", NA))
>
> sushi1 %>%
+ dplyr::filter(name != "Drunken Fish - Ballpark Village") %>%
+ pm_parse(input = "full", address = "address", output = "short", keep_parsed = "limited",
+ city_dict = cities, state_dict = mo)
# A tibble: 27 x 8
name address visit pm.address pm.city pm.state pm.zip pm.zip4
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 BaiKu Sushi Lounge 3407 Olive St, St. Louis, Missouri 63103 3/20/18 3407 Olive St St. Louis MO 63103 NA
2 Blue Ocean Restaurant 6335 Delmar Blvd, St. Louis, MO 63112 10/26/18 6335 Delmar Blvd St. Louis MO 63112 NA
3 Cafe Mochi 3221 S Grand Boulevard, St. Louis, MO 63118 10/10/18 3221 S Grand Blvd St. Louis MO 63118 NA
4 Drunken Fish - Central West End 1 Maryland Plaza, St. Louis, MO 63108 12/2/18 1 Maryland Plz St. Louis MO 63108 NA
5 I Love Mr Sushi 9443 Olive Blvd, St. Louis, Missouri 63132 1/1/18 9443 Olive Blvd St. Louis MO 63132 NA
6 Kampai Sushi Bar 4949 W Pine Blvd, St. Louis, MO 63108 2/13/18 4949 W Pine Blvd St. Louis MO 63108 NA
7 Midtown Sushi & Ramen 3674 Forest Park Ave, St. Louis, MO 63108 3/4/18 3674 Forest Park Ave St. Louis MO 63108 NA
8 Mizu Sushi Bar 1013 Washington Avenue, St. Louis, MO 63101 9/12/18 1013 Washington Ave St. Louis MO 63101 NA
9 Robata Maplewood 7260 Manchester Road, Maplewood, MO 63143 11/1/18 7260 Manchester Rd Maplewood MO 63143 NA
10 SanSai Japanese Grill Maplewood 1803 Maplewood Commons Dr, St. Louis, MO 63143 2/14/18 1803 Maplewood Commons Dr St. Louis MO 63143 NA
# … with 17 more rows