Introduction

This is a pytorch implementation of a sequence to sequence learning toolkit for the i-machine-think project. This repository is a fork from the pytorch-seq2seq library developed by IBM, but has substantially diverged from it after heavy development. For the original implementation, visit https://github.com/IBM/pytorch-seq2seq.

Requirements

This library runs with PyTorch 0.3.0. We refer to the PyTorch website to install the right version for your environment. To install additional requirements (including numpy and torchtext), run:

pip install requirements

Quickstart

There are 3 commandline tools available

  • train_model.py
  • evaluate.py
  • infer.py

Training

The script train_model.py can be used to train a new model, resume the training of an existing model from a checkpoint, or retrain an existing model from a checkpoint. E.g. to train a model from scratch:

 # Train a simple model with hidden layer size 128 and embedding size 128
`python train_model.py --train $train_path --dev $dev_path --output_dir $expt_dir  --embedding_size 128 --hidden_size 256 --rnn_cell gru --epoch 20 

Several options are available from the command line, including changing the optimizer, batch size, using attention/bidirectionality and using teacher forcing. For a complete overview, use the help function of the script.

Evaluation and inference

The scripts infer.py and evaluate.py can be used to run an existing model (loaded from a checkpoint) in inference mode, and evaluate a model on a test set, respectively. E.g:

  # Use the model stored in $checkpoint_path in inference mode
` python infer.py --checkpoint_path $checkpoint_path

  # Evaluate a trained model stored in $checkpoint_path
` python evaluate.py --checkpoint_path $checkpoint_path --test_data $test_data

Example script

The script example.sh illustrates the usage of all three tools: it uses the toy data from the test directory (containing a ‘reverse’ dataset in which the translation of any sequence of numbers is its inverse), trains a model on this data using train_model.py, evaluates this model using evaluate.py and then runs infer.py to generate outputs.

Once training is complete, you will be prompted to enter a new sequence to translate and the model will print out its prediction (use ctrl-C to terminate). Try the example below!

Input:  1 3 5 7 9
Expected output: 9 7 5 3 1 EOS

Checkpoints

During training, the top k models are stored in a folder which is named using the accuracy and loss of the model on the development set. Currently, k is set to 5. The folder contains the model, the source and target vocabulary and the trainer states.

Contributing

We welcome pull requests for the library. Please run both the unittests and integration test before committing:

python -m unittest discover sh integration_test.sh