Demos

Recent Publications

From the Corpora-List "Release: 23M German-English parallel sentences from patent text"

Submitted by cristina.españa on 6 March, 2013 - 15:50

Institut für Computerlinguistik -- Universität Heidelberg

We are happy to announce the release of a parallel corpus of patent text for the German-English language pair. The corpus has been constructed from EPO, WIPO and USPTO patent documents extracted from the MAREC collection and contains 23 million sentence pairs from all patent text sections.

All sentences are labeled with metadata: patent document id, patent family, patent classification and publication date.

The corpus is distributed under a Creative Commons License. For more information and download, please see http://www.cl.uni-heidelberg.de/statnlpgroup/pattr

Regards, Katharina Wäschle

-- Institut für Computerlinguistik Universität Heidelberg Im Neuenheimer Feld 325, D-69120 Heidelberg http://www.cl.uni-heidelberg.de/~waeschle