Case Study: Patents

ID: 
7
Leader: 
meritxell.gonzalez
Workplan wiki: 
WP7: Case Study Patents
Timeline: 
December, 2010 - February, 2012

Use of resources

Node Budgeted Period 1 Period 2 (est) Period 3 (est)
UGOT 12 0 RE:2.4, AS:2.5 X
UPC 15 0 7,5 X
Ontotext 15 0 X X

Objectives

The objectives are to

  • (i) create a commercially viable prototype of a system for Machine Translation (MT) and Retrieval of patents in the bio-medical and pharmaceutical domains,
  • (ii) allowing translation of patent abstracts and claims in at least 3 languages, and
  • (iii) exposing several cross-language retrieval paradigms on top of them.

Description of work

The work will start with the provision of user requirements (WP9) and the preparation of a parallel patent corpus (EPO) to fuel the training of statistical MT (UPC). In parallel UGOT will work on grammars covering the domain and subsequently, together with UPC, apply the hybrid (WP2, WP5) MT on abstracts and claims. Ontotext will provide semantic infrastructure with loaded existing structured data sets (WP4) from the patent domain (IPC, patent ontology, bio-medical and pharmaceutical knowledge bases, e.g. LLD). Based on the use case requirements, Ontotext will build a prototype (D7.1, D7.2) exposing multiple cross-lingual retrieval paradigms and MT of patent sections. The accuracy will be regularly evaluated through both automatic (e.g. BLEU scoring) and human based (e.g. TAUS) means (WP9).

Tasks

IDsort icon Status Timeframe
7.1 User Requirements Completed May 2011 - Oct 2011
7.2 Patent Corpora Completed Jun 2011 - Oct 2012
7.3 Grammars for the patent domain Ongoing Jan 2011 - Nov 2012
7.4 Ontologies and Document Indexation Ongoing Jun 2011 - Oct 2012
7.5 Patents Retrieval System Completed Jun 2011 - Dec 2012
7.6 Machine Translation Systems Completed Jan 2012 - Dec 2012
7.7 Protoype (User Interface) Completed Jun 2011 - Sep 2012
7.8 Evaluations Planned