How to select substring from a string?

iMayank · June 14, 2018, 5:54am

If my string is a DNA sequence TATACATGATCGGGTCATAAGCTATAATGGGGCAATAA

and i want to to extract substring from ATG to TAA then, what should i do?

According to this string there will be two substring -

ATGATCGGGTCATAA
2)ATGGGGCAATAA

RobertForum · June 14, 2018, 6:03am

I recommend package: stringi and you probably need to learn some basic stuff about regular expressions.

library(stringi)
x = "TATACATGATCGGGTCATAAGCTATAATGGGGCAATAA"
stri_extract(x, regex = "ATG.*?TAA")
"ATGATCGGGTCATAA"
# compare without ? (a.s.k greedy)
stri_extract(x, regex = "ATG.*TAA")
"ATGATCGGGTCATAAGCTATAATGGGGCAATAA"

iMayank · June 14, 2018, 6:20am

What if there are multiple substrings in a string then also this would help?

RobertForum · June 14, 2018, 6:52am

Just use:

stri_extract_all(x, regex = "ATG.*?TAA")
[[1]]
[1] "ATGATCGGGTCATAA" "ATGGGGCAATAA"

That's what you want?

iMayank · June 14, 2018, 6:53am

Aslo if i want to stop at not only TAA but at TAG and TGA also then, what should i do?

RobertForum · June 14, 2018, 7:03am

In that way we will play around all the time. I showed you an approach and I believe that you can learn basic reg. exp. that helps you to resolve the problem.

iMayank · June 14, 2018, 7:17am

Thank you so much, i have one more query,
If i have an argument of length 1000 to 10000 and if i want to extract substring from particular number to some particular number for example from 368 to 897 then what approach should i use?

RobertForum · June 14, 2018, 7:39am

You mean positions?

stri_sub(x, from = 3, to = 5)
[1] "TAC"

It's worth learning stringi package. There're lots of useful functions.

iMayank · June 14, 2018, 7:40am

Thank you sir! i will learn this package