- Internal Workplan
MOLTO's goal is to develop a set of tools for translating texts between multiple languages in real time with high quality. Languages are separate modules in the tool and can be varied; prototypes covering a majority of the EU's 23 official languages will be built.
As its main technique, MOLTO uses domain-specific semantic grammars and ontology-based interlinguas. These components are implemented in GF (Grammatical Framework), which is a grammar formalism where multiple languages are related by a common abstract syntax. GF has been applied in several small-to-medium size domains, typically targeting up to ten languages but MOLTO will scale this up in terms of productivity and applicability.
A part of the scale-up is to increase the size of domains and the number of languages. A more substantial part is to make the technology accessible for domain experts without GF expertise and minimize the effort needed for building a translator. Ideally, this can be done by just extending a lexicon and writing a set of example sentences.
The most research-intensive parts of MOLTO are the two-way interoperability between ontology standards (OWL) and GF grammars, and the extension of rule-based translation by statistical methods. The OWL-GF interoperability will enable multilingual natural-language-based interaction with machine-readable knowledge. The statistical methods will add robustness to the system when desired. New methods will be developed for combining GF grammars with statistical translation, to the benefit of both.
MOLTO technology will be released as open-source libraries which can be plugged in to standard translation tools and web pages and thereby fit into standard workflows. It will be demonstrated in web-based demos and applied in three case studies: mathematical exercises in 15 languages, patent data in at least 3 languages, and museum object descriptions in 15 languages.
MOLTO translation is based on GF and its resource grammar library, which currently covers the languages shown in the diagram here below. MOLTO tools will enable the specialization of both the Abstract Syntax and the language fragments to idiomatic usages in new domains of application.
- UGOT: University of Gothenburg, Sweden (coordinator)
- UHEL: University of Helsinki, Finland
- UPC: Universitat Politècnica de Catalunya, Barcelona, Spain
- Ontotext: Ontotext AD, Sofia, Bulgaria
- BI: Be Informed, Apeldoorn, The Netherlands
- UZH: University of Zurich, Zürich, Switzerland
Let us know what you think by using the MOLTO contact form.