Machine Translation Systems
22 Mar 2010
Europe/Vienna
ID:
7.6
Workpackage:
Case Study: Patents
Assignees:
aarne.ranta
Assignees:
cristina.españa
Assignees:
lluis.marquez
Assignees:
meritxell.gonzalez
Assignees:
ramona.enache
Relevant Deliverables:
Patent Case Study Final Report
Relevant Deliverables:
Patent MT and Retrieval Prototype
Relevant Deliverables:
Patent MT and Retrieval Prototype Beta
Dependencies:
Grammars for the patent domain
Status:
Completed
Timeframe:
Jan 2012 - Dec 2012
Completed on:
11 January, 2013 (All day) Contact @UPC: Lluis and Cristina
DEPENDENCIES:
Patents abstracts and claim are translated using the baseline of the hybrid system.
- Printer-friendly version
- Login to post comments
- Calendar
What links here
No backlinks found.
Comments
Input Encoding
After the completion of the translation we detected a tricky bug in the data that affects the quality of the translation of the compounds, which is one of the strong points of our system.
The solution has been to encode all the text into UTF8 and retrain the baseline system.
Utrecht meeting notes
The patent documents were translated using the SMT baseline system. These documents were later annotated to be added to the retrieval databases.
In order to improve the retrieval accuracy, we have change the annotation approach. So now, the documents are first annotated and then are being translated keeping the semantic annotations in the target language.
Also, a subset of the documents will be translated with the best version of the hybrid system.