Correcting variable names using stringr

I am attempting to correct a dataset that has multiple different names for the same variable due to differences in capitalization (ie. pepco, Pepco, PEPCO) using stringr.

So far, I have determined the different names the variables are listed under using the following code:

file.path("/Users/ryancoffey/Desktop/ElectricityDemand.txt") -> desktop.path

read_tsv(file.path(desktop.path)) -> ElectricityDemand

print(ElectricityDemand)
ElectricityDemand %>%
  distinct(Subregion)

I am wondering if anyone can help explain how to combine variables that correspond to each other but have "different" names using stringr commands.

I have included a screenshot of the dataset in order to help (datafile is a .txt file so I couldn't upload it along with this post).

Hi!

To help us help you, could you please prepare a proper reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

Hi @rcoffey1015,

It seems like you have values with similar names, not variables. One way to make sure the values for the Subregion variable are stored the same way would be to use dplyr::mutate and stringr...

library(stringr)

ElectricityDemand %>%
  mutate(subregion2 = str_to_title(Subregion))

You could replace str_to_title() to str_to_lower(), str_to_upper(), or str_to_sentence(), depending on your preference.

4 Likes

That worked. Thank you so much!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.