Data cleaning in R

Hello all,

I am relatively new to R and starting on a larger dataframe.

I have a basic question on cleaning my data. Several of my columns have the same variable with slightly different spelling (i.e. "red" and "RED"). Sometimes there are 3-4 variations of the same variable.

What is the best way to consolidate all the different spelling to just "red" for instance.

Thank you!

What are the variations ? is it mispelled or just different cases ?

In the latter case, tolower() can help.

1 Like

Thanks for that tip. In some instances, the names of individuals are mis-spelled.

For misspelling, string distance can help and you have ?agrep.
It is estimation only and not sure 100% but you can great something to get a confidence in the misspelling. I let you look into it.
stringdist :package: can also help maybe.

1 Like