Contacting PLuTO (Patent Language Translations Online)

Today I've attended a talk by John Tinsley from the Pluto project. It's been quite interesting and it's the first time I hear about their results!

I'll make a summary because he had said some things about their system and especially about the evaluation that can be useful also for us.

They have huge amounts of data, so many that they cannot use all of them in their webservice. By the way, it looks nice, although I haven't used it yet.

iptranslator.com http://www.cngl.ie/pluto-mt/

Pluto uses Matrex as main translator (an EBMT) and the combination with translation memories has not been very successful because sentences do not share similar fragments (that's not true in our case!). The translator is specific for every language and it seems that the amount of preprocessing is considerable. I cannot compare our results with theirs in a direct way, but they are better than Google patents and Systran, in a similar way that we are better than Google (the standard engine) and Bing.

They translate every kind of patent and do have a translator for every one of the 8 big IPC groups for every of the 7 languages they deal with. The evaluation is done in a test set of 1000 sentences (not fragments for what I understood) for every of the 8 groups (A,B,C,D,E,F,H,G). We are dealing with a very specific domain, so we only translate A61p patents, a subgroup within A. I could ask him for their test set, but we would have worse results because our system has not been trained for the whole domain. What we can do is to create a user and use their system with our test set, but probably it's better to contact them first.

About the evaluation, they only use BLEU and METEOR for the automatic one, but they have already done quite a lot of work in manual evaluation. They've selected 100 sentences from the 1000 used before and evaluated Adequacy and a ranking (Pluto vs. Google patents vs. Systran). For a 67% of the sentences Pluto is the best system (in En-Fr translation) and the Adequacy is close to 4 when measured from 0 to 5. A very good system!

They have also made a Usability experiment, we can get some inspiration, it`s quite easy. They give the user the description of an invention and 10 patents related to the invention that have been machine translated. Only 5 of them are relevant to the invention and the user has to say if every patent is relevant, it is not, or cannot say. Besides, the user is asked a short opinion about the translation.