Baseline systems
0
ID:
5.4
Workpackage:
Statistical and Robust Translation
Task leader:
cristina.españa
Relevant Deliverables:
Description and evaluation of the combination prototypes
Relevant Deliverables:
WP5 final report: statistical and robust MT
Dependencies:
Grammars for the patent domain
Dependencies:
Machine Translation Systems
Dependencies:
Parallel corpus compilation in Patents domain
Dependencies:
Patent Corpora
Dependencies:
Patents Retrieval System
Status:
Ongoing Development of three baseline systems:
- Raw GF translator
- Raw SMT translation
- Naïve combination of both (fallback)
M16. UGOT + UPC (Corresponding to MS5, M18)
What links here
No backlinks found.
Comments
Using semantic annotations from patents could help translation?
Ontotext is able to annotate patents with semantic information. Could we use this information in SMT translation?
As a first point we'll try to translate the annotations from source to target. Later we'll investigate how we can incorporate semantic information into the translation process.
First thoughts on the baselines
We have developed the SMT baseline that works as expected. The GF baseline is an elaborated system that already makes use of some statistical components so, in some way, the baseline is already a hybrid translation system that lacks of coverage in open domains.
The coverage in the patents test set is of a 7% of full sentences so, a naïve combination of GF and SMT is going to be very similar to SMT alone. Other approaches are needed for hybridisation.