Machine Translation Systems

22 Mar 2010
Europe/Vienna
ID: 
7.6
Workpackage: 
Case Study: Patents
Assignees: 
aarne.ranta
Assignees: 
cristina.españa
Assignees: 
lluis.marquez
Assignees: 
meritxell.gonzalez
Assignees: 
ramona.enache
Relevant Deliverables: 
Patent Case Study Final Report
Relevant Deliverables: 
Patent MT and Retrieval Prototype
Relevant Deliverables: 
Patent MT and Retrieval Prototype Beta
Status: 
Completed
Timeframe: 
Jan 2012 - Dec 2012
Completed on: 
11 January, 2013 (All day)

Contact @UPC: Lluis and Cristina

DEPENDENCIES:

  • TASK 2, 3
  • WP5. A baseline of the WP5 system will be integrated in the prototype.

Patents abstracts and claim are translated using the baseline of the hybrid system.

Comments

Input Encoding

After the completion of the translation we detected a tricky bug in the data that affects the quality of the translation of the compounds, which is one of the strong points of our system.

The solution has been to encode all the text into UTF8 and retrain the baseline system.

Utrecht meeting notes

The patent documents were translated using the SMT baseline system. These documents were later annotated to be added to the retrieval databases.

In order to improve the retrieval accuracy, we have change the annotation approach. So now, the documents are first annotated and then are being translated keeping the semantic annotations in the target language.

Also, a subset of the documents will be translated with the best version of the hybrid system.