The MANY MT System Combination Tool at the ML4HMT Workshop Shared Task

Patrik Lambert
Résumé du séminaire: 
Hybrid Machine Translation (HMT) has received some interest recently, with two workshops dedicated on the topic (18-19 November 2011): LIHMT (International Workshop on Using Linguistic Information for Hybrid Machine Translation) and the co-located ML4HMT (Shared Task on Applying Machine Learning techniques to optimising the division of labour in Hybrid MT). The aim of the LIHMT workshop is to promote corpus-based methods and technologies that combine resources and algorithms from the three general approaches to MT: rule-based (RBMT), example-based (EBMT) and statistical (SMT). One line of research in this topic is the combination of outputs from RBMT, EBMT and SMT systems. Along this line, the objective of the ML4HMT shared task is to investigate whether MT System Combination techniques could benefit from extra information (linguistically motivated, decoding and runtime) from the different systems involved. As a baseline, the ML4HMT shared task organizers considered the combination of plain text outputs (with no extra information) with state-of-the-art open-source system-combination systems, namely MANY [Barrault, 2010] and CMU-MEMT [Heafierld & Lavie, 2010]. This talk, which will also be given at ML4HMT, will focus on the following points: present the MANY MT combination system, present the results obtained for the shared task baseline and try to give some hints of ways to usefully introduce extra information from the different systems.