Baseline systems

0
ID: 
5.4
Task leader: 
cristina.españa
Assignees: 
cristina.españa
Assignees: 
lluis.marquez
Assignees: 
ramona.enache
Status: 
Ongoing

Development of three baseline systems:

  • Raw GF translator
  • Raw SMT translation
  • Naïve combination of both (fallback)

M16. UGOT + UPC (Corresponding to MS5, M18)

Comments

Using semantic annotations from patents could help translation?

Ontotext is able to annotate patents with semantic information. Could we use this information in SMT translation?

As a first point we'll try to translate the annotations from source to target. Later we'll investigate how we can incorporate semantic information into the translation process.

First thoughts on the baselines

We have developed the SMT baseline that works as expected. The GF baseline is an elaborated system that already makes use of some statistical components so, in some way, the baseline is already a hybrid translation system that lacks of coverage in open domains.

The coverage in the patents test set is of a 7% of full sentences so, a naïve combination of GF and SMT is going to be very similar to SMT alone. Other approaches are needed for hybridisation.