Case Study: Patents

December, 2010 - February, 2012

Use of resources

Node Budgeted Period 1 Period 2 (est) Period 3 (est)
UGOT 12 0 RE:2.4, AS:2.5 X
UPC 15 0 7,5 X
Ontotext 15 0 X X


The objectives are to

  • (i) create a commercially viable prototype of a system for Machine Translation (MT) and Retrieval of patents in the bio-medical and pharmaceutical domains,
  • (ii) allowing translation of patent abstracts and claims in at least 3 languages, and
  • (iii) exposing several cross-language retrieval paradigms on top of them.

Description of work

The work will start with the provision of user requirements (WP9) and the preparation of a parallel patent corpus (EPO) to fuel the training of statistical MT (UPC). In parallel UGOT will work on grammars covering the domain and subsequently, together with UPC, apply the hybrid (WP2, WP5) MT on abstracts and claims. Ontotext will provide semantic infrastructure with loaded existing structured data sets (WP4) from the patent domain (IPC, patent ontology, bio-medical and pharmaceutical knowledge bases, e.g. LLD). Based on the use case requirements, Ontotext will build a prototype (D7.1, D7.2) exposing multiple cross-lingual retrieval paradigms and MT of patent sections. The accuracy will be regularly evaluated through both automatic (e.g. BLEU scoring) and human based (e.g. TAUS) means (WP9).