MANY : Open Source Machine Translation System Combination


Lo´c Barrault.

Description

MANY is an MT system combination software which architecture is described is the following picture :


The combination can be decomposed into three steps


The decoder can be expressed as a classical log-linear model:

where λi is the weight of the feature function hi.  
Feature functions used:
  • The LM probability
  • The system prior, corresponding to the probability of choosing a system as backbone.
  • The words scores: currently, each word has a score equal to the prior of the system which proposed it
  • The word-length penalty of the word sequence,
  • The null-penalty corresponding to the number of null-arcs (or epsilon arcs) crossed to obtain the hypothesis.


Downloads

v1 (current version) 12/07/09 First version with Confusion Network generation and Token Pass decoder. [MANY SVN] (google code)


Related Work

TERp website : http://www.umiacs.umd.edu/~snover/terp/
SPHINX4 website : http://cmusphinx.sourceforge.net/sphinx4/
SRILM website : http://www.speech.sri.com/projects/srilm/

Lo´c BARRAULT