MANY : Open Source Machine Translation System Combination
MANY is an MT system combination software which architecture is described is the following picture :
The combination can be decomposed into three steps
- 1-best hypotheses from all M systems are aligned in order to build M confusion networks (one for each system considered as backbone).
- All CNs are connected into a single lattice. The first nodes of each CN are connected to a unique first node with probabilities equal to the priors probabilities assigned to the corresponding backbone.áThe final nodes are connected to a single final node with arc probability of one.
- A token pass decoder is used along with a language model to decode the resulting lattice and the best hypothesis is generated. á
The decoder can be expressed as a classical log-linear model:
is the weight of the feature function hi
Feature functions used:
- The LM probability
- The system prior, corresponding to the probability of choosing a system as backbone.
- The words scores: currently, each word has a score equal to the prior of the system which proposed it
- The word-length penalty of the word sequence,
- The null-penalty corresponding to the number of null-arcs (or epsilon arcs) crossed to obtain the hypothesis.
| v1 (current version)
||First version with Confusion Network generation and Token Pass decoder.
[MANY SVN] (google code)