Hi!
I could use help with cleaning some dirty string data. I want to extract the first ID value). The first 9 rows are precisely what I need. However, rows 10-21 are the problem.
I've tried using functions like str_extract and str_remove from the stringr package, but I can't figure out a pattern to remove the unwanted strings.
Could someone help me with the Stringr and/or Regex formula that will help me achieve this?
Thanks in advance,
James
Reprex
structure(list(con_number = c("CON16552", "CON15607", "CON15607",
"CON014592", "CON012146", "CON014085", "SP00012088", "SP00012088",
"SP00012088", "CON016107/CON017440", "CON016107/CON017440", "CON016107/CON017440",
"CON015304 (primary CON#)", "CON014838 (previous CON)", "CON015304 (primary CON#)",
"CON012407 ( Amendment: CON017074)",
"CON012407 ( Amendment: CON017074)",
"CON012407 ( Amendment: CON017074)",
"CON015103 [CON012429 (CON number for this award - this is a supplement)]",
"CON015103 [CON012429 (CON number for this award - this is a supplement)]",
"CON015103 [CON012429 (CON number for this award - this is a supplement)]"
)), row.names = c(NA, -21L), class = c("tbl_df", "tbl", "data.frame"
), na.action = structure(22:24, names = c("22", "23", "24"), class = "omit"))