Removing characters from column headers

Hello All,

I have a scenario that I am facing.

I am facing on how to remove X amount of characters from column headers.

In the example provided below, I provided the column headers in the DF I am working with.

I can't quite figure out how to remove specific characters from column headers.

For example, I want to remove the "PD." on all of the column headers.

[1] "PD.Patient.Code" "PD.Patient.Sex" "PG.Ethnicity" "PD.Data.Collectn.PHI"
[5] "PD.Patient.State" "PD.Patient.Birthdate" "PD.Pat.Height.Inches" "PD.Pat.Weight.Lbs"
[9] "PD.HIPAA.Sign.Date" "PT.Medical.Rcd.Nbr" "PD.Pat.Phone" "PD.Pat.Alt.Phone"
[13] "PD.Soc.Sec." "PD.Patient.E.Mail" "PD.Patient.Code.1" "PD.Patient.Last.Name"
[17] "PD.Patient.First.Nme" "PD.Patient.Street" "PD.Patient.Street.2" "PD.Patient.City"
[21] "PD.Patient.ZIP.Code" "PD.Patient.Birthdate.1"

Given this scenario, is there anyone that could assist me with this?

Disclaimer, I do wish to do most of my data manipulation using Dplyr with the pipe operator.

Due to the nature of this project, I am unfortunately unable to provide a file that I am directly using.

All the help is greatly appreciated.

Thank you for your consideration in helping me with this endeavor I am facing.

To begin, substitute for the_headers

the_headers <- colnames(YOUR_DATA_FRAME)
library(stringr)
# string beginning with "PD" followed by a period "."
# . is a metacharacter, which has to be escaped by \\
target <- "^PD\\."
the_headers <- c("PD.Patient.Code", "PD.Patient.Sex", "PG.Ethnicity", "PD.Data.Collectn.PHI", "PD.Patient.State", "PD.Patient.Birthdate", "PD.Pat.Height.Inches", "PD.Pat.Weight.Lbs", "PD.HIPAA.Sign.Date", "PT.Medical.Rcd.Nbr", "PD.Pat.Phone", "PD.Pat.Alt.Phone", "PD.Soc.Sec.", "PD.Patient.E.Mail", "PD.Patient.Code.1", "PD.Patient.Last.Name", "PD.Patie,.First.Nme", "PD.Patient.Street", "PD.Patient.Street.2", "PD.Patient.City", "PD.Patient.ZIP.Code", "PD.Patient.Birthdate.1")
trimmed <- str_remove_all(the_headers,target)
trimmed
#>  [1] "Patient.Code"        "Patient.Sex"         "PG.Ethnicity"       
#>  [4] "Data.Collectn.PHI"   "Patient.State"       "Patient.Birthdate"  
#>  [7] "Pat.Height.Inches"   "Pat.Weight.Lbs"      "HIPAA.Sign.Date"    
#> [10] "PT.Medical.Rcd.Nbr"  "Pat.Phone"           "Pat.Alt.Phone"      
#> [13] "Soc.Sec."            "Patient.E.Mail"      "Patient.Code.1"     
#> [16] "Patient.Last.Name"   "Patie,.First.Nme"    "Patient.Street"     
#> [19] "Patient.Street.2"    "Patient.City"        "Patient.ZIP.Code"   
#> [22] "Patient.Birthdate.1"

then

colnames(YOUR_DATA_FRAME) <- the_headers

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.