This workshop aims at reducing the gap, both theoretical and practical, between information retrieval and machine translation research and applications. Although both fields have been already contributing to each other instrumentally, there is still too much work to be done in relation to solidly framing these two fields into a common ground of knowledge from both the procedural and paradigmatic perspectives. From the practical application perspective, on the other hand, it is pathetically evident how the merging of these two disciplines is by far advancing at a much more rapid pace by leveraging over the vastest source of multilingual information source available ever: the web.
Information retrieval and machine translation are complex problems that have evolved into very active research fields during the last twenty years. Although they originated from clearly different problems, which have been historically approached by research communities with relatively different scientific traditions, they both constitute interdisciplinary areas that involve several common disciplines such as statistics, linguistics, psychology, and computer science, among others. Additionally, they both involve many specific sub-problems and subtasks for which a large arsenal of specialized techniques and methods have been developed and continue under development nowadays.
Exploiting synergies between information retrieval and machine translation is not a novel idea by itself, as there are already some examples which illustrate this kind of interaction between these two areas of research. Consider, for instance, the classical examples of using machine translation in cross-language information retrieval for query expansion and translation, as well as using information retrieval techniques for collecting corpora for training statistical machine translation systems or for ranking the outputs of a given translation system or collection of systems.
Nevertheless, synergies between these two disciplines can be pursued further and from very different perspectives. By a though experiment of abstraction any of these two fields can be considered to naturally embrace the other. Consider, for instance, machine translation as a subtype of information retrieval system in which the source sentences to be translated is “the query” and the output translated sentence is the “retrieved document”. Similarly, information retrieval can be thought as an instantiation of the machine translation problem in which the given query should be “translated” into a relevant retrieved document. On the other side of the scale, at the concretion level of implementation, many specific synergies can be discovered within the several specific sub-problems each discipline has to deal with, such as, just to mention a few, automatic alignment, word sense disambiguation, multiword extraction, search/decoding, ranking, relevance estimation, language modelling, domain adaptation, user feedback processing, etc.
As far as we know, no explicit effort has been done in the past for identifying and discussing possible areas of cooperation, interaction, and integration between the two fields. The main objective of this workshop is to define and identify possible synergies among different specific subtasks in the information retrieval and machine translation domains.
There are two expected outcomes for the proposed workshop. First, the generation of a roadmap for interesting research problems and questions related to the synergic use of information retrieval and machine translation technologies in the context of the multilingual World Wide Web. And, second, the creation of a Special Interest Group to focus on the definition, evaluation and follow up of the different activities derived from the generated roadmap.
We solicit contributions including but not limited to the following topics: