MANY is an MT system combination software which architecture is described is the following picture :
The combination can be decomposed into three steps
- 1-best hypotheses from all M systems are aligned in order to build M confusion networks (one for each system considered as backbone).
- All CNs are connected into a single lattice. The first nodes of each CN are connected to a unique first node with probabilities equal to the priors probabilities assigned to the corresponding backbone. The final nodes are connected to a single final node with arc probability of one.
- A token pass decoder is used along with a language model to decode the resulting lattice and the best hypothesis is generated.
The decoder can be expressed as a classical log-linear model:
where λ
i is the weight of the feature function h
i.
Feature functions used:
- The LM probability
- The system prior, corresponding to the probability of choosing a system as backbone.
- The words scores: currently, each word has a score equal to the prior of the system which proposed it
- The word-length penalty of the word sequence,
- The null-penalty corresponding to the number of null-arcs (or epsilon arcs) crossed to obtain the hypothesis.
Downloads
| v1 (current version) |
12/07/09 |
First version with Confusion Network generation and Token Pass decoder.
|
[MANY SVN] (google code)
|
Related Work
Loïc BARRAULT