WP9 User requirements and evaluation - M39
Summary of progress
Due to the progress of other work packages, the actual evaluation work was started at Spring 2013. Some of the evaluations were made within work packages, for instance the patent cases (WP7) were evaluated with automatic evaluation metrics, and the semantic multilingual wiki (WP11) was evaluated internally for usability. WP9's contribution to the project is translation quality evaluation with native or near-native speakers.
In the evaluations, human evaluators were presented with translations by MOLTO tools and references by other MT systems (Google, Bing, Systran), and they chose the most adequate, either for post-editing or to accept as such. From these results we calculated error rates, and in addition, the percentages to what extent the evaluators preferred MOLTO translations over other systems. The results vary between languages and use cases, but in general, both automatic evaluation metrics and the percentage of the evaluators' preferred translations suggest that MOLTO method fares better in the chosen domains.
During the evaluations, some errors were detected and the grammars in question were sent to be corrected. The time and effort needed to fix the languages that get the poorest results is another factor which is favorable to MOLTO tools: a systematic fix in the grammars corrects all instances of an erroneous construction.
Some methodological issues about the qualitative evaluation were raised during the project, especially concerning the evaluation of Phrasebook. MOLTO's goal has been publishable quality automatically, but the evaluation results have been less than perfect—however, this doesn't mean that the results are incorrect, but simply that there are many ways to say the same thing, and an evaluation method that compares an edit distance to a reference doesn't capture the whole picture. This discrepancy between the human perceptions of quality and post-editing operations is discussed in the project deliverable, and has been a topic of two conference papers by Maarit Koponen, one between M31-M39 period in AMTA 2012 Workshop on Post-editing Technology and Practice, and one presentation at the XI Symposium on Translation and Interpreting: Technology and Translation in Turku, Finland.
Highlights
- Deliverable D9.2 published
- Development of methodology of evaluating limited domain publishing quality systems
Deviations from Annex I
N/A
- Printer-friendly version
- Login to post comments
- Slides
What links here
No backlinks found.