WP3 - MOLTO CAT tools

WP3: MOLTO CAT tools

The MOLTO CAT scenario is designed to serve a translation community that carries out translation projects using MOLTO tools as an additional CAT tool. The translation community members are assigned different roles. What they may do depends on the role. Roles are assigned in the translation management system. In the MOLTO demonstration system, the TMS is Globalsight. The TMS manages the resources of a project. The resources include

documents
grammars

translation memories
term collections

A MOLTO CAT translation project is composed by a collection of resources and a community of actors playing different roles in the project. One actor can bear more than one role.

The roles include

project manager (rights to manage the resources and the workflow)
editor (source competence in domain, domain expert, authority to edit the source)
translator (bilingual competence in domain, not necessarily domain expert)
revisor (target language competence in domain, domain expert)
ontologist (competence to extend the domain ontology)
terminologist (bilingual or target competence in the domain)
grammarian (competence to extend domain GF grammar)

The TMS manages the project workflow, that is, routes documents through different steps between the actors. The actions include

project manager:
- create users
- assign roles to users
- create a translation project
- prepare resources for a translation project
- plan the workflow
- assign actors to actions

editor
- split source to constrained/unconstrained sections
- indicate allowed/authorize new deviations from constrained language source

translator:
- translate unconstrained sections using CAT tools (including SMT proposals from translation memory)
- translate constrained language sections using MOLTO
- propose term for lexical gap
- create grammar extension request

ontologist
- find or create missing concept
- create grammar extension request
- create terminology extension request

terminologist
- find or define equivalents to a new concept

grammarian a revisor
- carry out grammar extension

The typical envisaged workflow is this. A translator in a multilingual translation project works on a structured multipart document, some of whose parts are marked as amenable to translation with the MOLTO editor. The rest is translated with traditional CAT tools. A subsection appropriate for MOLTO translation is opened in the MOLTO translation editor. The appropriate GF grammar and terminology are specified in the project resources. If the section is properly within the fragment covered by the grammar, the section should parse and translate correctly without translator intervention. This is the default if the MOLTO marked section has been created in scenario A. However, until the domain grammar has been fully tested for blind translation in all target languages, a target language translator or revisor must check that the target text is correct.

If the grammar coverage is not complete, the translation editor shows some parts of the section marked as untranslatable.

In the easy case, the coverage problem can be fixed by a conservative paraphrase or, if the translator's brief permits pre-editing, by a more creative rewrite of the section source to bring it under the coverage of the MOLTO grammar. The original source and its paraphrase get stored in the translation memory as an instance of source rewrite, and will be available for other translators as a model solution of the coverage problem. If a rewrite is not possible, the next move depends on the workflow.

If the translator's brief is just to produce a complete translation to the target language in a bilingual project, the translator just translates the part not covered by MOLTO using traditional CAT tools. The out-of-coverage segment gets marked as a manually translated MOLTO section segment in the translation memory. Such segments can be collected and sent off as non-coverage tickets to the project's terminology and grammar management.

The task may be to extend MOLTO translation to a language whose coverage in the given domain is not complete.
1. In the case of a simple out-of-vocabulary term or concept belonging to a category known to the grammar, the MOLTO equivalents editor can be used to extend the concrete and/or abstract vocabulary of the grammar. If a concept with a matching GF category and verbalizations is found in an existing MOLTO term ontology, the missing term can be added into the translation project's GF grammar extension module so as to become immediately available to further MOLTO translation in the project and subsequently included in the project ontology.
2. If a candidate term is found using some non-authoritative lexical source, the candidate term gets added as a term candidate to the relevant domain for community approval. That is, the translation unit containing the proposed candidate concept/term in its abstract/concrete grammar context is saved in translation memory and sent to the terminology management platform for terminology checking and approval.

The task may be to develop a master text or pilot translation, in preparation for a subsequent multilingual translation project (pre-editing). A gap in the MOLTO coverage can arise when the special domain section subject to MOLTO translation has not been authored in the semantic wiki, but for instance generated from a database or merged from text from more than one subdomain. In this case, more effort is worth spending to extend the coverage of the MOLTO grammar to the source before proceeding to multilingual translation.
1. In the case of out of vocabulary terms or concepts, the grammar can be extended through the translation editor as above.
2. In more complex cases needing grammar extension, the translator just creates a model translation and submits it back to the ontology/grammar editing workflow. The model translation is saved in translation memory and can be used in regression testing against the edited grammar.

The MOLTO translation editor

As indicated in the MOLTO CAT system design, the MOLTO translation editor is integrated as a plugin to the translation management system alongside more traditional CAT editors. The MOLTO CAT scenario sets the following requirements on the editor and its integration to the TMS.

editor
- the MOLTO translation editor parser can out from the source parts it can translate and indicate what it lacks for parts that do not translate.
- the GF back end is able to include proposed extensions into the grammar.

The development of the translation editor to satisfy these requirements is taken over by UGOT, as it is closely coupled to the ongoing development of the GF robust parsing and grammar extension services.

integration
- the TMS environment is able to extract from structured source text parts which are subjected to MOLTO translation.
- the editor has access to a term/ontology manager to propose terms/concepts to fill the indicated gaps and submit new proposals for approval

These requirements remain the responsibility of UHEL.

Term ontology management with TermFactory?

The TermFactory? term management specification and query/editing API is a Tomcat Axis2 webservice API for querying, editing, and storing small RDF/OWL ontologies representing concepts and multilingual expressions/terms associated with the concepts. TermFactory? contains a term ontology schema that follows professional terminology standards, but the tools can also be used to edit any RDF/OWL ontologies through an XHTML representation RDF. The XHTML representation is extremely configurable. It can be parametrized for the presentation layout (concept oriented, lemma oriented), filtered for content, and even localized with another TF term ontology so that names of properties and classes shown to the user are chosen from the localization ontology. The term ontology editor is a pluggable javascript editor that is offered as a standalone Tomcat servlet as well as a MediaWiki? extension. A simpler tabular editor exists for the common task of adding different language equivalents to an existing ontology term.

TermFactory? is to be integrated with the MOLTO KRI over the JMS transport interface provided in the KRI. Besides the Ontotext repositories, TermFactory? also talks to Jena RDB and triple set repositories. TermFactory? user management is planned to happen through the GlobalSight? API.

WP3 Evaluation

The GlobalSight? translation management system forms a platform to test the MOLTO TT scenario that combines traditional CAT tools with the MOLTO translation editor. The best dataset for testing the full MOLTO CAT scenario should be the patents, since it already uses hybrid methods and generates a translation of less than 100% coverage. To have a complete use case of the mixed scenario, a pure GF grammar for chemical compounds could be applied to translate chemical compound definitions in the patent text.

The MOLTO CAT review workflow will be used manage translation quality evaluation of the multilingual translations produced in the other use cases. This exercise in itself also serves to test the usability of MOLTO scenario B.