2. Translation scenarios

GF is a multilingual interlingual translation system geared toward multilingual generation. As a proof of concept, GF demos display immediate generation into dozens of languages from a tiny grammar. Extended to a more realistic case, this scenario could have a native-language editor producing text for more or less immediate multilingual distribution, for instance, a multilingual website. For this scenario to work, the translations should be acceptable as is without target language native revision.

The GF approach as such suits best an authoring/pre-editing scenario, where an original author or authorised content editor can choose or edit the original message to conform to a domain specific constrained language grammar, which GF is then expected to blind translate reliably to a number of languages. In real life situations, the text to translate is likely to be at least partially new to the system, and no guarantee can be given that the translation is correct in all the generated languages. It is specifically such extensions of the original MOLTO "fridge magnet" demo scenario that this document tries to address.

The current professional human translation scenario is quite different. It is a post-editing scenario. The roles of author (client) and translator are separated. The translator has quite restricted authority over the target text and almost none over the original, aside from obvious errors. The translation process is normally bilingual. This is because the translation is created or at least chosen by human, and human translators rarely have professional competence in more than one or two languages besides their native language.

The preferred professional translation direction is from a second language to native language, because for humans language generation is more demanding than language understanding. In this direction, the translator can exploit external resources to understand the source text and use her native competence to check the quality of the target text. Even in this case, a native subject expert is usually needed to check the translation. The reviser need not know the source language or have the source text at hand.

The extended MOLTO translation scenarios considered here spread between these two extremes. We may still assume that the translator has some authority over the text to produce, i.e. she is the author or is authorized to adapt the text to better satisfy the constraints of the translation grammar. The MOLTO pre-editor/translator should be native or at least fluent in the source language, and familiar with the domain or at least its special language in order to know how the message can be (para)phrased. Thus the extended MOLTO scenario retains an element of constrained-language authoring or pre-editing.

But we may need to relax the blind generation assumption. Although the GF engine may give warnings or suggestions when it is unsure or knows the translation fails, there are likely to be cases where the translation is technically correct, but inadequate for human consumption. A native revision is then needed for one or more target language(s). As in the human translation case, the author/translator can at best serve as informant for one or a few target languages. For the rest, the translation needs to be distributed to a pool of revisers. In real life, a translation has to go out even if GF fails. There must be a way to override GF with human translations. If the translation were a one-off affair, that could end the process. However, in many real life scenarios, the same or very similar texts will come up for (re)translation, and in that case, the results of the revisions should get fed back to the translation cycle, to avoid making the same errors twice. In other words, we should make the MOLTO translation system consisting of GF and the human users an adaptive whole. This is the most demanding part to conceive here. Pre-editing MT has not been very successful in the past, probably partly just because not enough attention has been given to practical concerns like these.

Translation industry standards

As document automation progresses, professional translation is merging into localization, or the adaptation of software to a new locale (language and culture). Translation used to differ from localization in that translators were not expected to worry about formats or the document lifecycle. Translations were shipped to translators as raw text and returned as such. In an intermediate phase, a specialized localization industry developed to multilingualize software, preserving the source format. More recently, with multi channel publishing and document toolchains, there is again a push to separate form from content. The localization industry solution to these conflicting pressures is to separate content from form in a reversible fashion. Localization formats and tools like Gnu gettext and XLIFF make provisions for extracting the translatable text from a document in a way that allows embedding the target text in the same document form.

The current GF translation engine as such is neutral about the format of the text it receives, but the existing resource grammars expect text to come in raw form. It should be technically possible to include document formatting in GF parsing and generation, and if suitably restricted, that might be the most efficient solution for the translation of inline tags. However, for the rest, it seems best to take advantage of existing content extraction technologies in translation industry. We propose to use XLIFF for MOLTO translatable document format in the extended API.

XLIFF is one of the OASIS LISA OAXAL standards. As of 2011 February 28, the Localization Industry Standards Association (LISA) is insolvent. The LISA standards continue to be used by the industry. The OASIS Open Architecture for XML Authoring and Localization (OAXAL ) reference model, comprises the following open standards:

Unicode
XLIFF ‐ OASIS XML Localization Interchange File Format
SRX – LISA Segmentation Rules Exchange
TMX – LISA Translation Memory Exchange
GMX/V – LISA Word and Character Count Standard
W3C ITS – Internationalization Tag Set
xml:tm – LISA XML based text memory

Translation tools survey

For the extended scenario, we may add other industry standard CAT tools for MOLTO translators to use besides the core list above. There is a plethora of packages for CAT and translation project management/automation both commercial and open source. It seems best to borrow from existing open source packages that comply with translation industry standards, instead of reinventing the wheel. Examples of CAT packages are

OmegaT http://www.omegat.org/
OpenTM2 http://www.opentm2.org/
OpenTMS http://www.opentms.de/?q=en/node/27
TinyTM http://tinytm.sourceforge.net/
SwordFish http://www.maxprograms.com/products/swordfish.html
Heartsome http://www.heartsome.net/EN/home.html

SwordFish and HeartSome are commercial. Examples of translation project management and workflow automation packages are

ProjectOpen http://www.project-open.com/en/modules/index.php
GlobalSight http://www.kmworld.com/Articles/News/News/Open-source-translation-management-52166.aspx
SDL WorldServer http://www.sdl.com/en/language-technology/products/translation-management/worldserver/default.asp
Across v5 http://www.across.net/en/index.aspx
XTM http://www.xtm-intl.com/

Of the systems listed above, ProjectOpen and GlobalSight are open source, the rest are commercial.

From existing open source projects we can shop for properties generally expected from TM (http://en.wikipedia.org/wiki/Translation_memory), CAT (http://en.wikipedia.org/wiki/Computer-assisted_translation), and translation project management software. Some commercial systems also have open interfaces, e.g. Across (http://en.wikipedia.org/wiki/Across_Systems). Here are some translation tools listings from the Web.

For comparisons, see e.g. Wikipedia.

What links here

No backlinks found.

Demos

Recent News

Recent Publications

2. Translation scenarios

Translation industry standards

Translation tools survey

See also

What links here

Wiki index

EVENTS

Current signups for