Frequently Asked Questions - Technology
- What are the main ideas behind MOLTO?
- What is GF?
- Why do you believe these ideas will work?
- Isn't interlingua an unrealistic dream?
- Are there any scientific challenges?
- How will you combine rules and statistics?
- What tools will you build?
- What platforms will the tools run on?
- How is a new translation domain defined?
- How do you relate to Google's tools?
- How will you evaluate your results?
The main idea is to use interlinguas based on domain semantics and equipped with reversible generation functions. Thus translation is a composition of parsing the source language and generating the target language. An implementation of this technology is provided by GF, Grammatical Framework, grammaticalframework.org. GF is in MOLTO complemented by the use of ontologies, such as used in the semantic web. We will also use methods of statistical machine translation (SMT) for improving robustness and extracting grammars from data.
GF is a framework for defining multilingual grammars, each based on a
common abstract syntax. The abstract syntax is defined by using
type theory, in the same way as in
logical frameworks.
The natural language generation part is called concrete syntax,
which is a feature-based grammar formalism equivalent to
PMCFG (Parallel Multiple Context-Free Grammars) and
has polynomial parsing behaviour.
GF uses PMCFG as its "machine language", which is compiled from
high-level grammar code that features a rich type system
and a module system inherited from functional programming languages.
GF has been developed for 12 years now, and multilingual GF-based translation has been tested in numerous applications, ranging from mathematics via software specifications to spoken dialogue systems (see GF homepage). We also believe there are lots of interesting domain translation tasks out there, even if we cannot provide a competitor to open-domain systems like Google translate.
Yes, it is, if we want to have a universal interlingua working for everything. This is why we don't believe we can ever translate newspapers with MOLTO techniques. However, domain-specific interlinguas have proved quite feasible. Notice that this move is similar to what has happened in ontologies: they have moved from universal ontologies to domain ontologies.
The first challenge is to scale up the size of applications. Not so much the number of languages, which we know how to manage already, but the lexicon size - from hundreds to thousands of words. We need techniques to build manually and extract automatically such translation lexica. This leads to the second challange, which is to minimize the development effort, in terms of skills and time: to make GF available for people with no special training, as a part of their normar work flows. This needs both new algorithms and interaction design. The third challenge is to exploit the use of ontologies in the semantic web to boot-strap GF abstract syntax; the goal is to enable an automatic conversion from OWL to GF. We also want to enable multilingual queries of ontological databases by using MOLTO translation. The fourth challenge is to combine GF with statistical machine translation (SMT), both to improve the robustness of translation and to extract grammars from text.
This is perhaps the most speculative research topic in MOLTO. We will, first of all, attach to the increasing efforts on hybrid systems, where statistics is used as fall-back of rule-based translation, and there are many yet-to-be-explored technical ideas around this. We will also use statistics to automatically extract translation rules, and to resolve ambiguities. But we want to maintain the control of the quality of the translation; thus we won't blindly return uncertain fall-back translations without warning the user about the uncertainty.
The main generic tools are extensions of GF with new user interfaces: a grammar engineer's tool for building systems for new domains, and a translator's tool for using a given translation system. On top of these generic tools, we will build tools tailored to the domains of our case studies. Thus, while the generic translator's tool will be usable in the mathematics domain as well, the users will appreciate its integration with computer algebra systems; the museum object tools will be integrated with existing tools for browsing the databases, and so on. The main idea for these domain-specific interfaces will be that the generic tools are also available as libraries, with APIs enabling their adaptation to new environments.
Our code will run on all major operating systems: Linux, Mac OS X, and Windows. So users can download and install MOLTO tools on their own computers. But we will also make them available as web services. The translator's tool, in particular, should be usable within a web browser without any software downloading required. Some kinds of translators, e.g. tourist phrasebooks, will also be natural to run on mobile phones, e.g. on the iPhone and Android platforms. We will provide user interfaces adapted to these platforms, for both on-line and off-line use.
Here is a concrete example of how it can go on. Let's say you want to build a translator for arithmetic propositions. Then you build first of all an abstract syntax, which defines basic concepts such as the set of natural numbers, the properties "even" and "odd", and the relation "greater than"; properties and relations are functions from expressions to propositions. This is how the abstract syntax looks like in GF:
Nat : Set
Even : Exp -> Prop
Odd : Exp -> Prop
Gt : Exp -> Exp -> Prop
Sum : Exp -> Exp
The first concrete syntax is maybe English. You define it by giving examples of each of the new concepts, providing information similar to the following (in a text file, via a GUI, or by a web form with slots for each concept):
Nat = "number"
Even x = "x is even"
Odd x = "x is odd"
Gt x y = "x is greater than y"
Sum x = "the sum of x"
A German concrete syntax is given by a second set of examples:
Nat = "Zahl"
Even x = "x ist gerade"
Odd x = "x ist ungerade"
Gt x y = "x ist größer als y"
Sum x = "die Summe von x"
By using the English and German resource grammars, each example is generalized to the set of rules that is needed for using these concepts in all contexts and combinations. For instance, the resulting system enables the translation of every even number that is greater than 0 is the sum of two odd numbers as jede gerade Zahl, die größer als 0 ist, ist die Summe von zwei ungeraden Zahlen.
You cannot ignore Google when working on machine translation: for most people, it is the state of the art for translation on the web. We see MOLTO translation as an approach diametrically opposed to Google's (precision rather than coverage) and also with different application (producer's rather than consumer's tool). The underlying technology is different: Google translation is based on statistics, MOLTO on grammars. Despite all these differences, hybrid systems might well combine MOLTO with Google translate. In hybrid systems, it is an advantage to combine systems that are based on different principles.
We will collect feedback from our web-based demos. We will also use standard machine translation evaluation tools, BLEU and TAUS, and make comparisons with other translation tools. In addition to translation quality, we will measure the productivity and usability of our tools in user studies. And like many other European projects, we will have a scientific board with independent experts to monitor our progress.

MOLTO is funded by the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement FP7-ICT-247914.
