WP5 Statistical and robust translation - M30
Summary of progress
The milestone MS8 has been achieved in M30 (Translation tool complete), which for WP5 meant to have a complete system integrating the grammar and STM. Although the system is already available on Gothenburg's server we are still working on improvements.
The work done during the fifth period has been focused on 4 of the 6 tasks of the workpackage:
5.3 Robust Parsing First efforts to include the robust parsing work done in the previous semesters into the hybrid systems are being done. The work is in progress and the final idea is to be able to use GF's robust parsing to deal with the chunks instead of relying on Genia.
5.4 Baseline systems Refinements on the French GF grammar have been done in order to improve the performance. The German grammar has been done from scratch and it is now comparable to the French one.
5.5 Hybrid Models The new grammars have been integrated in the final hybrid system. Different versions of the previous hybrids are now available. In particular, a new system considers different probabilities for the GF translations according to the confidence in obtaining them. This information can be also used in the development step of the statistical system. A one-click system has been developed with the most promising hybrid system. This system will be updated with new hybrids whenever we obtain a better translation performance.
5.6 Systems evaluation A wider evaluation of the baseline systems has been done by including syntactic and semantic metrics into the evaluation. Also, the comparison with external translation systems such as Google and Bing has been redone in order to reflect the improvements of these systems during the last year. A comparison with Pluto is also done. However, we realized that since we share some data there is the possibility that our test sets are in their training data. We plan on using confidence estimation measures in order to be able to test on different patents for which none of us have translations such as American patents.
Highlights
A GF grammar for patents has been developed for German and improved for French
A hybrid system with the new grammars have been evaluated and a new one which takes into account probabilities for the GF translations has been built
The work on robust parsing has resulted in two submissions to the Coling 2012 conference
A one-click system for the hybrid translator has been build and is now available as a shell command on the server in Gothenburg. Partners wishing to test the system should contact UGOT to obtain access to the server.
Deviations from Annex I
There are no deviations from Annex I and at M30 the workpackage has produced a hybrid system for patent translation for English-to-French and English-to-German. For the opposite directions we use SMT as fallback.
However, we plan to continue the work on hybrid systems by improving the current German translator and the integration with robust parsing. With this, D5.3 will be postponed till January, when new hybrid systems will be also finished and prepared to be evaluated within WP9.
- Printer-friendly version
- Login to post comments
- Slides
What links here
No backlinks found.