An attentional NMT model in Dynet

  • Users starred: 27
  • Users forked: 3
  • Users watching: 27
  • Updated at: 2019-05-12 11:14:22

Attentional Seq2Sqeq model in DyNet


You will need dynet to run this code. See the installation instructions here

On top of this, you will also require


You will also need perl to run the evaluation script.


You can test your installation by running

python -c config/test_config.yaml -e train

The --config_file/-c argument specifies the path to a .yaml file containing the options, see config/test_config.yaml for an example.

You can use the --env/-e argument to specify a subset of parameters you wish to use (for instance to differentiate between training and testing). Again, see config/test_config.yaml for an example.

Alternatively, you can set all the parameters via the command line, run

python -h

For more details on the available parameters

The script folder contains a bash script to launch training & testing. Beware that the default config uses the GPU.


The current model uses LSTMs (dynet.VanillaLSTM) for the encoder(s) and decoder, as well as MLP attention.

More specifically given an input sentence


and an output sentence


The model is trained to minimize the conditional log-likelihood


Where the probability is computed using an encoder decoder network :

  1. Encoder: with the default parameter, this encodes the source sentence with a bidirectional LSTM


  1. Attention : the attention scores are computed based on the encodings and the previous decoder output.


  1. decoding : this uses an LSTM as well.



The data is from the IWSLT2016 workshop for German-English translation, separated into a training set, validation set (the dev2010 set from IWSLT), and test set (the tst2010 set). There is an additional blind test set for which translations are not provided.


With the configuration stored in config/best_config.yaml, a BLEU score of 27.26 is attained on the test set.

Here are some samples (randomly selected, edited some spacing for clarity):

Target Hypothesis
i didn't mention the skin of my beloved fish, which was delicious -- and i don't like fish skin ; i don't like it seared, i don't like it crispy. i don't mention the skin of my beloved fish that was delicious, and i don't like a UNK. i don't like it. i don't like you.
we will be as good at whatever we do as the greatest people in the world. we 're going to be so good at doing whatever the most significant people in the world.
i actually am. that 's me even.
tremendously challenging. a great challenge.
if you 're counting on it for 100 percent, you need an incredible miracle battery. if you want to support 100 percent of it, you need an incredible UNK.


Thanks to

  • Graham Neubig for the course project from which this repo stemmed.
  • Chunting Zhou for some much needed help.

More info

Check this tutorial for more info on sequence to sequence model.

This project was written in Dynet, a cool dynamic framework for deep learning.

Hit me up at pmichel1[at] for any question (or open an issue on github).