Hi,
I have a simple table (df) with just 1 column (col_1). Each row is string of different lengths that describes a school kid, such as:
"8YOB that has gone for multiple detentions"
"12 Year old girl that has been diagnosed with vision impairment"
"10YO boy from a single-mom family"
I'd like to pick up each row's gender, and then add that as a new column. I know that the gender will always come after the age (which is either a 1 or 2-digit number), and the gender is always stated as "boy", "Boy", "girl", or "Girl". But the number of characters that sits between the age and the gender is variable, although it should be less than 20 characters.
So, I'd like to identify where the age is in each row, then pick out the first "b", "B", "g", or "G" that appears after the age, and then put that in the new column, as either capital M or capital F. So far, this is what I have:
pattern = optional(DGT) %R% DGT #this line is my main problem, not sure how to code this
gender = str_match(df$col_1, pattern) #how to convert the gender to capital?
df$Gender = gender
Any help? Thanks.