5. Defining evaluation corpora and tools
Defining evaluation corpora and tools
Not much could be done here (yet). We have not got patent corpora. The mathematicians have yet to collect their word problems. We got a small museum text corpus (approx. 25000 words in Swedish, a set of 9 short passages translated into English presumably by non-native speakers) from Gothenburg.
We have translated parts of this corpus both manually and using MT for test material in BLEU evaluation. A pilot comparing BLEU scores on this material to a manual error analysis is on the way.
A small test GF grammar for a sample of the corpus has been written (link). It has helped making more concrete the requirements on grammar-ontology interoperability (below).
We have also fetched the usual EU multilingual corpora on our test platform (hippu.csc.fi).
We have found time to install an evaluation platform, collect and test standard issue translation quality evaluation tools, to develop forthcoming MOLTO lexicon tools, to learn GF and develop ideas about the ontology to grammar interface. The IQmt evaluation platform was tested on a small sample of machine and human translated text (English into Finnish) (see https://kitwiki.csc.fi/twiki/bin/view/MOLTO/EvaluationCookbook).
UHEL also took part in the MOLTO phrasebook task, a demo for translating touristic phrases between 14 European languages: Bulgarian, Catalan, Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Polish, Romanian, Spanish, Swedish. This experiment presents one way evaluate the effort required for adding new language versions (more on this below).
We divide the rest of the paper by WPs into the front end: translation tool, the use cases and associated lingware (ontologies and grammars), and the back end: the translation system (WPs 2,4,5), presented in this order. We also try to form an idea about what WPs are currently about to see how they are construing their tasks. Information about this (at least task titles) was found on MOLTO website.
- Printer-friendly version
- Login to post comments
- Slides
What links here
No backlinks found.