Comparing two Encoder Decoder Models for Machine Translation - one with and one without Attention Mechanism

Hi, I am a newbie in deep learning, and apologize if the question is too elementary.

I was going through this article.

What I want is to study the effect of attention in neural machine translation context.

So considering the model in this article as baseline, I wanted to use an attention layer between the encoder and decoder layers.

Then I want to compare the two models, similar to the idea in this site.

But the attention mechanism is used here has constraint of same lengths of input and output sequences.

Any idea to implement such, in R or Python, will be appreciated.

1 Like


there's a tensorflow notebook implementing this:

We are in the process to make this available from R very soon, but in the meantime you could take a look at/use the Python implementation if you wanted :slight_smile:

The notebook implements Bahdanau attention but you could replace that by another (similar) algorithm easily.


I am aware of this and have already gone through this.

As I understand, this implementation uses GRU (not LSTM, unlike this) in both encoder and the decoder (implemented with Bahamandu attention). But there is not an implementation of a basic decoder.
As I want to study the effect of attention, I think I will need to have another decoder (without attention), and then I will have to train separately using two decoders (and the same encoder). Then I can compare the final results of these two models.

But the problem is I am not really comfortable with tensorflow. I am using keras as I don't really need (at least, not yet) much control over the network.

So, can you please help me in the implementation of vanilla decoder? Or, can you provide me links to some references (suitable for beginners) to learn tensorflow? I am aware of this book, but I'm yet to start it.


if you can wait a little we're just working on the R version of the notebook.

Otherwise, I'd take the Python notebook and create a slightly modified version of it, just removing the attention logic from the Decoder class (in the call method).
That should give you 2 versions you can immediately compare.

Has the R version of the Python Notebook been completed? If yes, would you please provide a link?

Hey, are you looking for this?

I haven't been able to improve my knowledge on deep learning (there was a lot of course work), so couldn't solve the problem. If you're able to create two decoders (with and without attention mechanism) and share the implementations, I'll highly appreciate it.