Workshop on Applying Machine Learning techniques to optimising the division of labour in Hybrid MT (ML4HMT)

20 Jul 2011
Europe/Stockholm

FIRST CALL FOR PAPERS

Machine Translation Summit XIII (MT Summit XIII)

URL: http://www.dfki.de/ml4hmt/

Workshop Purpose and Theme

The workshop will explore alternatives in order to provide optimal support for Hybrid MT design, using sophisticated machine-learning techniques. One further important objective of the workshop is to build bridges from MT to the ML community to systematically and jointly explore the choice space for Hybrid MT.

Workshop Programme

The workshop will open with an invited talk (speaker TBA), followed by two technical paper sessions and a challenge or shared task session, and will conclude with a discussion panel.

Topics of Interest of the Technical Papers

Topics of interests include, but are not limited to:

  • use of Machine Learning techniques in combination / hybridization of Machine Translation systems
  • using richer linguistic information in phrase-based SMT (e.g. in factored models or hierarchical SMT)
  • using phrases from different types of MT in e.g. phrase-based SMT
  • system combination approaches, either parallel in multi-engine MT (MEMT) or sequential in statistical post-editing (SPMT)
  • learning resources (e.g. transfer rules, transduction grammars) for probabilistic rule-based MT

All contributions will be published in the workshop proceedings.

Shared Task Description

The "Shared Task on Optimising the Division of Labour in Hybrid MT " is an effort to trigger systematic investigation on improving state-of-the-art Hybrid MT, using advanced machine learning (ML) methodologies. Participants are requested to build Hybrid/System Combination systems by combining the output of several systems of different types, which is provided by the organizers.

The main focus of the shared task is trying to answer the following question:

Could Hybrid/System Combination MT techniques benefit from extra information (linguistically motivated, decoding and runtime) from the different systems involved?

  • Data: The participants are given a development bilingual set, aligned at a sentence level. Each "bilingual sentence" contains:

    • the source sentence,
    • the target (reference) sentence and
    • the corresponding multiple output translations from 5 different systems, based on different MT approaches (Apertium, Ram?rez-Sanch?z, 2006; Joshua, Zhifei Li et al, 2009; Lucy, Alonso and Thurmair, 2003; Matrex, Penkale et. al 2010) Metis, Vandeghinste et al., 2006). The output has been annotated with system-internal information deriving from the translation process of each of the systems (see below).
  • Baseline: As a baseline we consider state-of-the-art open-source system- combination systems, such as MANY (Barrault, 2010) and CMU-MEMT (Heafierld & Lavie, 2010).

  • Challenge: Participants are challenged to build an MT mechanism that improves over the baseline, by making effective use of the system-specific MT output. They can either provide solutions based on an open source system, or develop their own mechanisms. A suggested approach is given below:

    • Spanish-English will be the language direction
    • The development set can be used for tuning the systems during the development phase. Final submissions have to include translation output on a test set, which will be available one week before the submission deadline
    • If you need language/reordering models they can be built upon the WMT News Commentary (http://www.statmt.org/wmt11/).
    • Participants can also make use of additional linguistic analysis tools, if their systems require so, but they have to explicitly declare that upon submission, so that they are judged as "unconstrained" systems.
  • Evaluation: The system output will be judged via peer-based human evaluation. During the evaluation phase, participants will be requested to rank system outputs of other participants through a web-based interface (Appraise; Federmann 2010). Automatic metrics (BLEU, Papineni et. al, 2002) will be additionally used.

  • System description: shared task participants will be invited to submit short papers (4-6 pages) describing their systems or their evaluation metrics (see instructions in Submissions).

Important Dates

  • May 20th : Release of data for the challenge
  • July 20th - Paper Submissions due / Challenge results due
  • August 10th : Author notification / Release of challenge evaluation results
  • August 19th : Final version due

Submissions

Technical papers and system description papers should follow the main conference formatting requirements (http://mt.xmu.edu.cn/mtsummit/SubmitPapers.html#). To submit contributions, please follow the instructions at the Workshop management system submission website: https://www.easychair.org/account/signin.cgi?conf=ml4hmt.

The contributions will undergo a double-blind review by members of the programme committee. Please address queries to ml4hmt@easychair.org.