I have this dataset and I want to delete the red squared portion in all rows. how can I do this? Thank you.
I would recommend using either
stringr::str_remove. If you are looking to remove that text verbatim, something like this should work:
data$WGS.ID <- stringr::str_remove(data$WGS.ID, "_WGS_processed_downsamp.bam.cip")
If you are wanting to remove everything after the initial ID (i.e.
PGDX...), something like this should work:
data$WGS.ID <- stringr::str_remove(data$WGS.ID, "_WGS.+")
@FJCC But I got another problem, on this data I want to remove this red squared portion from all rows, how can I do this?
If you want to remove everything starting at the first underscore, you can to this. I invented a tiny data set for the illustration.
DF <- data.frame(WGS.ID = c("PGDX5881P_WGS_blah", "PGDX5882P_WGS_blah"), OtherColumn = 1:2) DF #> WGS.ID OtherColumn #> 1 PGDX5881P_WGS_blah 1 #> 2 PGDX5882P_WGS_blah 2 DF$WGS.ID <- gsub("([^_]+)_.+", "\\1", DF$WGS.ID) DF #> WGS.ID OtherColumn #> 1 PGDX5881P 1 #> 2 PGDX5882P 2
Created on 2022-03-18 by the reprex package (v2.0.1)
The regular expression
([^_]+)_.+" means "one or more characters that are not an underscore, an underscore, one or more of any character".
([^_]+) means "one or more characters that are not an underscore" and the parentheses define that as the first group, which will be used later.
The _ is simply an underscore.
.+ means "one or more of any character"
The replacement argument of gsub is
\\1, which means "use the first group defined in the regular expression".
@FJCC Thank you very much. It worked. I am very happy.
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.