Comparing two Encoder Decoder Models for Machine Translation - one with and one without Attention Mechanism

Yarnabrina · June 28, 2018, 6:44am

Hi, I am a newbie in deep learning, and apologize if the question is too elementary.

I was going through this article.

What I want is to study the effect of attention in neural machine translation context.

So considering the model in this article as baseline, I wanted to use an attention layer between the encoder and decoder layers.

Then I want to compare the two models, similar to the idea in this site.

But the attention mechanism is used here has constraint of same lengths of input and output sequences.

Any idea to implement such, in R or Python, will be appreciated.

zkajdan · July 2, 2018, 6:48am

Hi,

there's a tensorflow notebook implementing this:

https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/contrib/eager/python/examples/nmt_with_attention/nmt_with_attention.ipynb

We are in the process to make this available from R very soon, but in the meantime you could take a look at/use the Python implementation if you wanted

The notebook implements Bahdanau attention but you could replace that by another (similar) algorithm easily.

Yarnabrina · July 3, 2018, 12:59pm

Hi,

I am aware of this and have already gone through this.

As I understand, this implementation uses GRU (not LSTM, unlike this) in both encoder and the decoder (implemented with Bahamandu attention). But there is not an implementation of a basic decoder.
As I want to study the effect of attention, I think I will need to have another decoder (without attention), and then I will have to train separately using two decoders (and the same encoder). Then I can compare the final results of these two models.

But the problem is I am not really comfortable with tensorflow. I am using keras as I don't really need (at least, not yet) much control over the network.

So, can you please help me in the implementation of vanilla decoder? Or, can you provide me links to some references (suitable for beginners) to learn tensorflow? I am aware of this book, but I'm yet to start it.

zkajdan · July 4, 2018, 8:10am

Hi,

if you can wait a little we're just working on the R version of the notebook.

Otherwise, I'd take the Python notebook and create a slightly modified version of it, just removing the attention logic from the Decoder class (in the call method).
That should give you 2 versions you can immediately compare.

zellw · March 13, 2019, 12:33pm

Has the R version of the Python Notebook been completed? If yes, would you please provide a link?

Yarnabrina · March 13, 2019, 12:54pm

Hey, are you looking for this?

I haven't been able to improve my knowledge on deep learning (there was a lot of course work), so couldn't solve the problem. If you're able to create two decoders (with and without attention mechanism) and share the implementations, I'll highly appreciate it.