Explore and build translation engines especialised for patent translation
Integrate the translations into the patents retrieval system
IPC classification A61P
Specific therapeutic activity of chemical compounds or medical preparations.
Abstracts and claims
Claims are written in a lawyerish style and using a very specific vocabulary of chemistry, full of compounds names.
The use according to claim 7, wherein said cancer diseases comprise bladder, lung, mamma, melanoma and prostate carcinomas.
A compound according to claim 1 wherein it is (2S)-2-[(4S)-4-(2,2-difluorovinyl)-2-oxopyrrolidinyl]butanamide.
The pharmaceutical composition according to claim 1 or 2, wherein said platinum anticancer agent is selected from at least one of the complexes having structures of: ** IMAGE **.
Table here
Different process for the translation engine and retrieval system
Common step: tokenising
Main difference: mark-up and semantic annotations
Esquema + Exemple
8-difluoro-2- [ 3-fluoro-4 - [ ( L-lysyl ) amino ] phenyl ] -7-methyl-4H-1-benzopyran-4-one
vs.
8-difluoro-2-[3-fluoro-4-[(L-lysyl)amino]phenyl]-7-methyl-4H-1-benzopyran-4-one
Plot with SMT, GF => HYBRID
Berkeley parser for German
Similarity is computed as the overlap of the linguistic elements in the reference and the candidate.
En2Fr & En2De results
Also other language pairs?
Lexicon
Grammar
Word-to-word GIZA aligments not enough
Solution adopted:
Split compounds, word-to-word mapping, join afterwards
RAMONA something here, please
Construction?
Sizes and sources for static, safe, unsafe, parse, noparse
RAMONA, please
NPs and AdvP are mapped into GF categories and linearised
VP, RelP and AdjP are linked to a NP in order to be linearised
Disambiguation of multiple linearisations by frequency counts in the corpus
I need to choose only the representative systems (3?)
Nominalisation
Gerund translated into infinitive + preposition (+ article)
Relative sentences
Gerund and participle sentences not common in German
They are replaced by a relative clause during chunking
As before I need to choose only the representative systems (3?)
Evaluation with lexical and syntactic metrics
1008 fragments from the MAREC test set
...
Pre-process and cleaning
The use of claim 23 , wherein the amount of said composition is from 100 mg to 800 mg of ibuprofen .
the use of claim 2 3 wherein the amount of said composition is from 1 0 0 mg to 8 0 0 mg of ibuprofen
Pre-process necessary for parsing
Use of generic resources (parseEng, DictEng, parseGer, DictGer) and domain lexicons (ExtraLex, ExtraLexGer)
Marec test set, 1008 fragments
Cleaning: 537 fragments
Properly linearised: 98 fragments
Evaluation with lexical and syntactic metrics
GF grammar with SMT built lexicon and disambiguation by frequency counts
Robust parsing with statistical models for searching the space and for disambiguation
Additional SMT decoding on top of GF and SMT to choose the best translation options
Hard Integration -- GF phrases are forced to appear -- SMT complements -- top SMT reorders
Soft Integration -- GF and SMT phrases interact -- top SMT reorders and chooses the best option -- LM plays an important role in choosing
Integration only at decoding time Either Soft or Hard, it is applied on the test set
MERT with GF The final decoder weights are obtained also with an integration in development
Characteristics and options
Table with the best systems
Table with the best systems
Experiment definition
JUSSI, after the evaluation
Table?
JUSSI, after the evaluation
JUSSI, after the evaluation
One-click system
Offline translation in the retrieval system
Translation tools?
Webservice?
csmisc14:hybrid cristina$ perl H1PTrad.pl
Usage: perl H1PTrad.pl -v # -m [runtime|unsafe|demo] [src2trg]
-v: verbosity [0,1,2]
-m: mode [runtime|unsafe|demo]
input: file to translate
src2trg: language pair
Ex: perl H1PTrad.pl -v 1 -m demo /Users/systems/input/patsA61P.test.en en2fr
Documents | Claims | Descriptions | Abstracts | |
---|---|---|---|---|
English | 4,485 | 62,638 | 3,832 | 2,518 |
German | 2,047 | 32,007 | 192 | 80 |
French | 2,011 | 31,487 | 130 | 44 |
```
The use of a compound of the formula:
or isomers i.e. geometric, optical, entianomeric, diasteriomeric, epimeric, stereoisomeric, tautomeric, conformational, or anomeric forms, salts, solvates and chemically protected forms thereof, in the preparation of a medicament for inhibiting the activity of PARP
, wherein: A and B together represent a fused aromatic ring, optionally substituted with one or more substituent groups selected from halo, nitro, hydroxy, ether, thiol, thioether, amino, C ${1-7}$ alkyl, C ${3-20}$ heterocyclyl and C $_{5-20}$ aryl;
R C is -CH $2$-R L , where R L is a C ${5-20}$ aryl group, optionally substituted with one or more substituent groups selected from C ${1-7}$ alkyl, C ${5-20}$ aryl, C $_{3-20}$ heterocyclyl, halo, hydroxy, ether, nitro, cyano, acyl, carboxy, ester, amido, amino, sulfonamido, acylamido, ureido, acyloxy, thiol, thioether, sulfoxide and sulfone; and
R N is hydrogen. ```
The interface is available in EN, DE and FR
Something here?
screenshot if integrated