Pairwise alignment error message

homework

#1

Hello, I am trying to align two sequences using the command
pairwiseAlignment(pattern = Seq2, subject = Seq1)
but I get an error message:
Error in .Call2("XStringSet_align_pairwiseAlignment", pattern, subject, :
key 0 not in lookup table


#2

Hi @Nabi1! Welcome!

I'm afraid you'll need to supply some more info in order for helpers to be able to understand your problem. Most error messages are difficult to understand on their own without seeing all of the code that they depend on (not just the one line that threw the error), and it's not clear from the above code what package(s) you're using.

Can you try making a self-contained reproducible example that demonstrates your problem? This should be complete, runnable code with all the necessary library() calls and some sample data that demonstrates your problem. For more specific info on how to do that, see the link! :grin: You might also want to take a look at this: FAQ: Tips for writing R-related questions


#3

Thank you.
I am trying to align two sequences which i call Seq1 and Seq2.
I used the libraries: library(Biostrings), library(seqinr)

#Read prok.fasta into a new variable
prokaryotes <- read.fasta(file = "prok.fasta", seqtype = "DNA")

#Read first sequence
Seq1 <-as.character(prokaryotes[[1]])
Seq1 = paste(Seq1, collapse="")

#Read second sequence
Seq2 <-as.character(prokaryotes[[2]])
Seq2 = paste(Seq2, collapse="")

#Align Seq2 and Seq1 using default libraries
pairwiseAlignment(pattern=Seq2, subject=Seq1)

#Error message
Error in .Call2("XStringSet_align_pairwiseAlignment", pattern, subject,  : 
  key 0 not in lookup table

Below are my sequences as save in my working directory but converted to string form as seen above

ATGATAACCCTGACCTACCGCATCGAAACGCCCGGCAGCGTCGAGACGATGGCGGACAAGATCGCCAGCG
ACCAGTCGACCGGAACCTTCGTGCCGGTTCCCGGCGAGACGGAAGAGCTGAAATCGCGCGTCGCCGCCCG
GGTTTTGGCGATCCGCCCGCTCGAGAATGCGCGCCATCCGACCTGGCCCGAGTCCGCGCCCGACACGCTG
CTCCACCGCGCCGACGTCGACATTGCCTTCCCTCTGGAGGCGATCGGCACAGATCTCTCGGCGCTGATGA
CCATCGCGATCGGCGGCGTCTATTCGATCAAGGGCATGACCGGCATCCGCATCGTCGACATGAAGCTGCC
CGAAGCTTTCCGGAGCGCCCATCCCGGGCCGCAATTCGGCATAGCGGGCAGCCGCCGCCTCACCGGCGTC
GAGGGCCGCCCGATCATCGGCACGATCGTCAAGCCGGCACTGGGGCTGAGGCCGCACGAGACGGCGGAAC
TCGTCGGCGAATTGATTGGGTCGGGCGTCGACTTCATCAAGGACGATGAGAAGCTGATGAGCCCGGCCTA
TTCGCCGCTCAAGGAGCGCGTCGCCGCGATCATGCCGCGCATTCTCGATCACGAGCAGAAGACCGGCAAG
AAGGTCATGTATGCCTTCGGCATCTCGCATGCCGATCCCGACGAGATGATGCGCAACCACGATATCGTCG
CTGCGGCCGGCGGCAATTGCGCCGTCGTCAATATCAATTCGATCGGCTTCGGCGGCATGAGCTTCCTGCG
CAAGCGCTCCAGCCTGGTGCTGCATGCGCATCGCAACGGCTGGGATGTGCTGACGCGCGATCCGGGCGCC
GGCATGGATTTCAAGGTCTATCAGCAGTTCTGGCGGCTGCTCGGCGTCGACCAGTTCCAGATCAACGGCA
TCAGAATCAAATATTGGGAGCCGGACGAGAGCTTCGTCTCTTCCTTCAAGGCCGTCAGCACGCCGCTCTT
CGATGCCGCCGATTGCCCGCTTCCGGTCGCGGGCTCCGGCCAGTGGGGCGGGCAGGCGCCGGAGACCTAC
GAGCGCACCGGCCGCACCATCGATCTTCTCTATCTCTGCGGCGGCGGCATCGTCAGCCATCCCGGCGGTC
CTGCTGCCGGCGTGCGCGCCGTGCAGCAGGCCTGGCAGGCGGCGGTCGCCGGCATTCCGCTGGAGGTCTA
TGCCAAGGATCATCCGGAGCTTGCCGCCTCGATTGCCAAATTCAGCGACGGCAAGGGCGCGTGA

Synechococcus

ATGCCCAAGACGCAATCTGCCGCAGGCTATAAGGCCGGGGTGAAGGACTACAAACTCACCTATTACACCC
CCGATTACACCCCCAAAGACACTGACCTGCTGGCGGCTTTCCGCTTCAGCCCTCAGCCGGGTGTCCCTGC
TGACGAAGCTGGTGCGGCGATCGCGGCTGAATCTTCGACCGGTACCTGGACCACCGTGTGGACCGACTTG
CTGACCGACATGGATCGGTACAAAGGCAAGTGCTACCACATCGAGCCGGTGCAAGGCGAAGAGAACTCCT
ACTTTGCGTTCATCGCTTACCCGCTCGACCTGTTTGAAGAAGGGTCGGTCACCAACATCCTGACCTCGAT
CGTCGGTAACGTGTTTGGCTTCAAAGCTATCCGTTCGCTGCGTCTGGAAGACATCCGCTTCCCCGTCGCC
TTGGTCAAAACCTTCCAAGGTCCTCCCCACGGTATCCAAGTCGAGCGCGACCTGCTGAACAAGTACGGCC
GTCCGATGCTGGGTTGCACGATCAAACCAAAACTCGGTCTGTCGGCGAAAAACTACGGTCGTGCCGTCTA
CGAATGTCTGCGCGGCGGTCTGGACTTCACCAAAGACGACGAAAACATCAACTCGCAGCCGTTCCAACGC
TGGCGCGATCGCTTCCTGTTTGTGGCTGATGCAATCCACAAATCGCAAGCAGAAACCGGTGAAATCAAAG
GTCACTACCTGAACGTGACCGCGCCGACCTGCGAAGAAATGATGAAACGGGCTGAGTTCGCTAAAGAACT
CGGCATGCCGATCATCATGCATGACTTCTTGACGGCTGGTTTCACCGCCAACACCACCTTGGCAAAATGG
TGCCGCGACAACGGCGTCCTGCTGCACATCCACCGTGCAATGCACGCGGTGATCGACCGTCAGCGTAACC
ACGGGATTCACTTCCGTGTCTTGGCCAAGTGTTTGCGTCTGTCCGGTGGTGACCACCTCCACTCCGGCAC
CGTCGTCGGCAAACTGGAAGGCGACAAAGCTTCGACCTTGGGCTTTGTTGACTTGATGCGCGAAGACCAC
ATCGAAGCTGACCGCAGCCGTGGGGTCTTCTTCACCCAAGATTGGGCGTCGATGCCGGGCGTGCTGCCGG
TTGCTTCCGGTGGTATCCACGTGTGGCACATGCCCGCACTGGTGGAAATCTTCGGTGATGACTCCGTTCT
CCAGTTCGGTGGCGGCACCTTGGGTCACCCCTGGGGTAATGCTCCTGGTGCAACCGCGAACCGTGTTGCC
TTGGAAGCTTGCGTCCAAGCTCGGAACGAAGGTCGCGACCTCTACCGTGAAGGCGGCGACATCCTTCGTG
AAGCTGGCAAGTGGTCGCCTGAACTGGCTGCTGCCCTCGACCTCTGGAAAGAGATCAAGTTCGAATTCGA
AACGATGGACAAGCTCTAA

How do I resovle this


#4

From the documentation for pairwiseAlignment (http://web.mit.edu/~r/current/arch/i386_linux26/lib/R/library/Biostrings/html/pairwiseAlignment.html), pattern and subject need to be an Xstring object (http://web.mit.edu/~r/current/arch/i386_linux26/lib/R/library/Biostrings/html/XString-class.html). It's hard to know for sure whther this will work without seeing your input data, but from looking at your sequences, I assume they're DNA, so you might be able to convert your sequences into XString objects using:

library(Biostrings)
Seq1 <- DNAString(prokaryotes[[1]])
Seq2 <- DNAString(prokaryotes[[2]])

pairwiseAlignment(pattern = Seq2, subject = Seq1)

From the documentation, it appears there are a lot of optional arguments you can pass to pairwiseAlignment that affect analysis of your sequences, which I won't pretend to understand, but might be pertinent so be sure to have a read.


#5

@mrblobby. Thank you so much mrblobby.
I used the suggestion you gave but it still did not work out. I tried to convert seq1 and seq2 to DNAStringdoing this

library(Biostrings) 
Seq1 <;- DNAString(prokaryotes[[1]]) 
Seq2 <- DNAString(prokaryotes[[2]])

It gave another error message this time

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘XString’ for signature ‘"SeqFastadna"’

#6

Well firstly there's a typo when you define Seq1 (which should throw up an error?) but I'm guessing the error you see here is probably because DNAString does not know how to deal with the data you load via read.fasta. Is there a reason you use read.fasta rather than something like read_csv? It would really help if you could provide some sample data otherwise I'm just guessing.

Maybe try the following after loading the data with read.fasta:

Seq1 = paste(prokaryotes[[1]], collapse="")
Seq1 <- DNAString(Seq1)

edit: from having a Google, it looks like you can load fasta format files using the BioStrings package: http://web.mit.edu/~r/current/arch/i386_linux26/lib/R/library/Biostrings/html/XStringSet-io.html. This might be your best bet.


#7

Thank you mrbobly. I got the first error message again.

The file is a fasta file and is not authorized to be upload here. Maybe I will just give the lint to the web page.

I want to align the first two sequences using the command
pairwiseAlignment(pattern=seq2, subject=seq1)
only. Later I will be modifying to see how other factors affect the alignment.
It is actually a course in which I am learning to use R. So far everything was okay but at this point I got stuck.

These are the steps I followed
#Download the pro.fasta file into my working directory

Libraries

library(Biostrings)
library(seqinr)

prokaryotes <- read.fasta(file = "prok.fasta", seqtype = "DNA")

#Split first two sequences into individual sequences
seq1<-as.character(prokaryotes[[1]])
seq1=paste(seq1,collapse="")

seq2<-as.character(prokaryotes[[2]])
seq2=paste(seq2,collapse="")

#aligned the two sequences using the default libraries
pairwiseAlignment(pattern=seq2, subject=seq1)

I got the two sequences as strings but error in pairwiseAlignment


#8

Hi @Nabi1, did you see my previous response? What happened when you tried to incorporate my suggestions? I'm feeling disinclined to just tell you the answer if the purpose of this is to learn your way around R as well.

If you follow the advice in my previous reply and copy it for Seq2 you should find a solution that works;
I've just tried it with your data and pairwiseAlignment returns a score. If you're still having issues after trying the above, paste the code that you've used in a reply here (please read the FAQ that jcblum posted if you're unsure how to do that) and I'll have a go at trying to figure it out with you :slight_smile:


#9

:smile: I tried your suggestion but it did not work out. I do not know if it is the version of Rstudio that I am using or what. The version is R 1.1.463 or my operating system which is windows.


#10

Hi @Nabi1, without a reproducible example of your code I cannot help you. If you think it might be because of R or RStudio, posting your code will still help because then we can rule out the problem isn't because of something incorrect there. Again, see jcblum's post above on how to produce a good reproducible example; specifically this bit:


#11

@mrblobby. Thank you so much for your support. It was a version problem. The 3.1 version did the trick.


#12

If your question's been answered (even if by you), would you mind choosing a solution? (See FAQ below for how).

Having questions checked as resolved makes it a bit easier to navigate the site visually and see which threads still need help.

Thanks


#13

@ Mara, Thank you for your advise.


#14