I have a data.frame with 2 columns of strings. In the first column, I want to make sure that for each observation, the first character of string is a letter or a number. For example, if it is "* BNY Mellon Cash Reserve", I need to change it into "BNY Mellon Cash Reserve", no space is allowed in the first character.
The second column contains strings of numbers, but some are with parenthesis and other non-number stuff, and I want to only keep the numbers, so I can convert it to numeric values in the future.
Strings are tricky. You can use stringr in tidyverse.
library(tidyverse)
df <- df %>%
mutate(
col1=str_trim(str_replace(col1, "[:punct:]", " ")), # replaces punctuation and trims leading and trailing spaces
col2=str_trim(str_replace(col2, "[[:punct:][:alpha]]", " ")) # replaces punctuation and letters and trims leading and trailing spaces
)
The double backslash is the signal that you're not nesting code with a paren, but rather identifying a character -- ( -- in the string that you wish to remove and replace with nothing -- "".
With respect to your first column, a good solution might be (without a reprex we can't know):
Your_First_Column=str_sub(Your_First_Column, 4, 18) ... where 4 and 18 mark the beginning and end of what you want to keep from the string. Obviously, this won't work if the pattern of the beginning and end of the strings are different.