Hi, I am a newbie in deep learning, and apologize if the question is too elementary.
I was going through this article.
What I want is to study the effect of attention in neural machine translation context.
So considering the model in this article as baseline, I wanted to use an attention layer between the encoder and decoder layers.
Then I want to compare the two models, similar to the idea in this site.
But the attention mechanism is used here has constraint of same lengths of input and output sequences.
Any idea to implement such, in R or Python, will be appreciated.