Injected Linguistic Tags to Improve Phrase Based SMT

Waleed Oransa
IBM - Egypt (Le Caire)
Résumé du séminaire: 
Statistical machine translation (SMT) has proven to give good results between languages with high similarity in morphological and grammatical nature, However, SMT still needs improvements when used to translate text between languages that have different morphology and syntax structure, especially between poor and rich morphological languages like English and Arabic. <br/> In this seminar, Injected Linguistic Tags approach is presented which improves the phrase based statistical machine translation (PBSMT). This approach has been applied to "English to Arabic translation". The Injected Tags (ITs) approach is language independent and can be used with any language pair. The proposed approach incorporating English-Arabic languages using the state-of-the-art PBSMT system is presented. This approach presents a method to enrich and expand the SMT parallel corpus to allow more capabilities and vocabularies. The proposed approach has been evaluated and a comparison between its results with several online MT services has been presented. It has shown good improvement of the translation quality of at least 13% increase of BLEU score. The experiments reveal that the results achieved by this approach considered significant enhancements over PBSMT. Further more, the experiments show that for the translation system that uses the proposed approach, an increases of the noun/verb gender-number agreement of the translated text are recorded </p>