2. Translation scenarios

inari.listenmaa

Multilingual Online Translation

Translation industry standards

As document automation progresses, professional translation is merging into localization, or the adaptation of software to a new locale (language and culture). Translation used to differ from localization in that translators were not expected to worry about formats or the document lifecycle. Translations were shipped to translators as raw text and returned as such. In an intermediate phase, a specialized localization industry developed to multilingualize software, preserving the source format. More recently, with multi channel publishing and document toolchains, there is again a push to separate form from content. The localization industry solution to these conflicting pressures is to separate content from form in a reversible fashion. Localization formats and tools like Gnu gettext and XLIFF make provisions for extracting the translatable text from a document in a way that allows embedding the target text in the same document form.

The current GF translation engine as such is neutral about the format of the text it receives, but the existing resource grammars expect text to come in raw form. It should be technically possible to include document formatting in GF parsing and generation, and if suitably restricted, that might be the most efficient solution for the translation of inline tags. However, for the rest, it seems best to take advantage of existing content extraction technologies in translation industry. We propose to use XLIFF for MOLTO translatable document format in the extended API.

XLIFF is one of the OASIS LISA OAXAL standards. As of 2011 February 28, the Localization Industry Standards Association (LISA) is insolvent. The LISA standards continue to be used by the industry. The OASIS Open Architecture for XML Authoring and Localization (OAXAL ) reference model, comprises the following open standards:

  • Unicode
  • XLIFF ‐ OASIS XML Localization Interchange File Format
  • SRX – LISA Segmentation Rules Exchange
  • TMX – LISA Translation Memory Exchange
  • GMX/V – LISA Word and Character Count Standard
  • W3C ITS – Internationalization Tag Set
  • xml:tm – LISA XML based text memory

Translation tools survey

For the extended scenario, we may add other industry standard CAT tools for MOLTO translators to use besides the core list above. There is a plethora of packages for CAT and translation project management/automation both commercial and open source. It seems best to borrow from existing open source packages that comply with translation industry standards, instead of reinventing the wheel. Examples of CAT packages are

SwordFish and HeartSome are commercial. Examples of translation project management and workflow automation packages are

Of the systems listed above, ProjectOpen and GlobalSight are open source, the rest are commercial.

From existing open source projects we can shop for properties generally expected from TM (http://en.wikipedia.org/wiki/Translation_memory), CAT (http://en.wikipedia.org/wiki/Computer-assisted_translation), and translation project management software. Some commercial systems also have open interfaces, e.g. Across (http://en.wikipedia.org/wiki/Across_Systems). Here are some translation tools listings from the Web.

For comparisons, see e.g. Wikipedia.