Cleaning and fixing dataset with text variable containing full sentences and paragraphs.

Hello everyone,

I am trying to clean a dataset as part of the pre-processing. The dataset has (among others) a text variable in which respondents wrote large amounts of text (varying from a single sentence to full paragraphs). I want to clean the dataset by going through all unique words after tokenizing into words and then cleaning all the typing errors.

My plan was to use:

df %>%
str_replace_all("typo1", "fix1")
str_replace_all("typo2", "fix2")

however, is there another way in which I can feed maybe a dataframe with all the typo's and the fixes? So I get to keep the list of typo's and fixes separately as well. Something with Var1 = Typo, Var2 = Fix.

Example of an instance in the text variable:
"I work 24h per week" and I would like this to turn into "I work 24 hour per week"

Kind regards,

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.