If my string is a DNA sequence TATACATGATCGGGTCATAAGCTATAATGGGGCAATAA
and i want to to extract substring from ATG to TAA then, what should i do?
According to this string there will be two substring -
- ATGATCGGGTCATAA
2)ATGGGGCAATAA
If my string is a DNA sequence TATACATGATCGGGTCATAAGCTATAATGGGGCAATAA
and i want to to extract substring from ATG to TAA then, what should i do?
According to this string there will be two substring -
I recommend package: stringi and you probably need to learn some basic stuff about regular expressions.
library(stringi)
x = "TATACATGATCGGGTCATAAGCTATAATGGGGCAATAA"
stri_extract(x, regex = "ATG.*?TAA")
"ATGATCGGGTCATAA"
# compare without ? (a.s.k greedy)
stri_extract(x, regex = "ATG.*TAA")
"ATGATCGGGTCATAAGCTATAATGGGGCAATAA"
What if there are multiple substrings in a string then also this would help?
Just use:
stri_extract_all(x, regex = "ATG.*?TAA")
[[1]]
[1] "ATGATCGGGTCATAA" "ATGGGGCAATAA"
That's what you want?
Aslo if i want to stop at not only TAA but at TAG and TGA also then, what should i do?
In that way we will play around all the time. I showed you an approach and I believe that you can learn basic reg. exp. that helps you to resolve the problem.
Thank you so much, i have one more query,
If i have an argument of length 1000 to 10000 and if i want to extract substring from particular number to some particular number for example from 368 to 897 then what approach should i use?
You mean positions?
stri_sub(x, from = 3, to = 5)
[1] "TAC"
It's worth learning stringi package. There're lots of useful functions.
Thank you sir! i will learn this package