Comparing two Encoder Decoder Models for Machine Translation - one with and one without Attention Mechanism



Hi, I am a newbie in deep learning, and apologize if the question is too elementary.

I was going through this article.

What I want is to study the effect of attention in neural machine translation context.

So considering the model in this article as baseline, I wanted to use an attention layer between the encoder and decoder layers.

Then I want to compare the two models, similar to the idea in this site.

But the attention mechanism is used here has constraint of same lengths of input and output sequences.

Any idea to implement such, in R or Python, will be appreciated.



there's a tensorflow notebook implementing this:

We are in the process to make this available from R very soon, but in the meantime you could take a look at/use the Python implementation if you wanted :slight_smile:

The notebook implements Bahdanau attention but you could replace that by another (similar) algorithm easily.



I am aware of this and have already gone through this.

As I understand, this implementation uses GRU (not LSTM, unlike this) in both encoder and the decoder (implemented with Bahamandu attention). But there is not an implementation of a basic decoder.
As I want to study the effect of attention, I think I will need to have another decoder (without attention), and then I will have to train separately using two decoders (and the same encoder). Then I can compare the final results of these two models.

But the problem is I am not really comfortable with tensorflow. I am using keras as I don't really need (at least, not yet) much control over the network.

So, can you please help me in the implementation of vanilla decoder? Or, can you provide me links to some references (suitable for beginners) to learn tensorflow? I am aware of this book, but I'm yet to start it.



if you can wait a little we're just working on the R version of the notebook.

Otherwise, I'd take the Python notebook and create a slightly modified version of it, just removing the attention logic from the Decoder class (in the call method).
That should give you 2 versions you can immediately compare.


I'll appreciate if you please modify the Python notebook to create another decoder without attention. That'll also help me to understand the code.

And, since you guys are already working on the R version, can you please include both versions of the decoder in that? I think that'll be helpful to the beginners.