Is there a package or method to set some 'rules' for R to follow when using inner_join.
I have a dataset where the way the id for some observations is written is different in one dataset to another. For example "Example A Twp" in dataset 1 = "Example A Township" in dataset 2 I would want to set the rule "Twp" == "Township" along with some other rules.
Here is some example data
setting some sort of equivalent rule would be better for me than simply using str_remove(" twp") since there are some observations in the data with similar names
I've always just standardized names like Andresrcs says, maybe renaming the original column locality_raw or some such. This is definitely my advice if your use case is locality names like in your example. Replacing all abbreviations with full words would standardize them and flag any issues of vague names. For example, there may be a Berwick Borough and a Berwick City. No way to confidently match just "berwick". This happens annoyingly often in my work.
However, if you're trying to match fields that likely have typos or incorrect formats, then I'd still expand the abbreviations but follow that up with fuzzy matching.
I've struggled with that problem around street names - rife with mispellings and inconsistencies. I tried brute force - filter out problems and write str_replace to fix them, and other similar tactics. But it is a bit like trying to stop the tide coming in with a bucket. My next attempt I plan to try using deep learning to fix the problems - we'll see how well that works.