Hi all:
I have a dataset with 1600 observations. Respondents were asked to write down the regions where they are living. They have spelt those regions in all sorts of creative and different ways, with Capital letters, non-capital, ALL BLOCKS, misspelt, etc.
I would like to assign each of these names to one level of one factor (there exist 20 regions in total). Is there a way to perform this action in R? Dataset language is Italian if relevant.
Case differences can be handled using functions from stringr. Here's an example of how to convert everything to sentence case. Once the case has been standardized, you can convert it to a factor.
Misspellings are harder to deal with. You'll probably have to use something like forcats::fct_collapse() to combine the misspelled regions into the correct factor level.