These are the questions we have been asked about MOLTO, with our answers. If you do not find what you are looking for, you are welcome to contact us.

• ## What is MOLTO's goal, in one sentence?

We want to develop tools for automatically translating documents on the web, with a high quality and between many languages (up to 15 simultaneously).

• ## How does MOLTO differ from existing translation tools on the web?

Tools like Systran (Babelfish) and Google Translate are designed for consumers of information, but we will mainly serve the producers of information. We want the quality to be good enough so that, for instance, an e-commerce site can translate their web pages automatically without the fear that the message will change. With other tools, a potential customer can, for instance, read an e-commerce page written in French and translate it into Swedish just to find out whether the shop has something of interest for her.

• ## Isn't this too good to be true?

There is a price we have to pay of course: we will not be able to translate just anything. We can only translate things that we have customized the system to translate. This follows from a well-know trade-off in machine translation: one cannot at the same time reach full coverage and full precision. In this trade-off, Systran and Google have opted for coverage whereas MOLTO opts for precision.

• ## What kind of things will you be able to translate?

MOLTO translators are specialized to different domains, which use language in uniform and well-understood ways. In MOLTO itself, we will build systems for three such domains: mathematical exercises, biomedical patents, and museum object descriptions. But these domains are just examples, which help us to develop and evaluate the tools; we expect the tools to be applicable to new domains by other people. Examples of such domains could be e-commerce sites, Wikipedia articles, contracts, business letters, user manuals, and software localization.

• ## Will you be able to translate newspaper texts?

No. "Newspaper text" is not a well-defined domain in MOLTO's sense, at least not in the light of the knowledge we have today. So we leave it to other tools to translate newspapers, novels, and random web pages.

• ## Is it a huge effort to build quality translation systems for new domains?

This is exactly what we want to make easier. Traditionally, it has been an effort of years to build a translation system of any reasonable size. We want to bring this down to months, in some cases even to days. And we want it to be doable for persons without special training in MOLTO, in linguistics, or in programming. Read the "Technology" section to find out how we believe we can do this.

• ## Will you make human translators unemployed?

No. Firstly because we cannot translate outside well-defined domains. Secondly, and more interestingly, we will provide new working modes for human translators: instead of translating similar documents in the same domain over and over again, they will be able to work on customizing the translation systems. The systems will learn from a few well-chosen examples, translated by humans, how to translate other texts within the same domain. This will raise the translator's work to a higher level.

• ## Will the quality match human translators?

Human translators will always be better than MOLTO at making intelligent decisions about style, and hence produce more elegant text. On the other hand, MOLTO will be good at terminologies and idiomatic usages in specialized domains, for which human translators might lack training.

• ## What languages are there in MOLTO?

MOLTO is committed to dealing with 15 languages, which includes 12 official languages of the European Union - Bulgarian, Danish, Dutch, English, Finnish, French, German, Italian, Polish, Romanian, Spanish, and Swedish - and 3 other languages - Catalan, Norwegian, and Russian. But during the project, other languages are likely to be added, since they are provided by other on-going projects.

• ## How can I add my language?

The main thing we use for each language in MOLTO is a resource grammar, which is actually a software library that defines the grammatical rules of the language: its word inflection and syntactic structures. Writing a resource grammar for a new language requires an effort of 3--6 months from a reasonably skilled programmer with good theoretical and practical knowledge of the language.

• ## Which are the most likely next languages?

There is on-going work on at least Arabic, Farsi, Hebrew, Hindi/Urdu, Icelandic, Japanese, Latvian, Maltese, Portuguese, Swahili, Tswana, and Turkish. The EU languages that still lack developers are Czech, Estonian, Greek, Hungatian, Irish, Lithuanian, Slovak, and Slovene. You are most welcome to contribute to any of these languages!

• ## When will MOLTO be available for use?

We will release the first prototype of MOLTO web service in June 2010. This prototype will be constantly updated, and more mature tools will be released during 2011. The case studies will be finished in late 2012. But you can already now get an idea of the underlying technology by trying out a fridge magnet demo or a text input demo.

• ## Your translator has errors - where's the quality?

We will receive feedback from users continuously, and fix all errors as soon as possible. One advantage with MOLTO technology is that it is highly programmable: we can locate errors in translations with high precision, and produce a fixed version of the system quickly without breaking anything else.

• ## Which people are there in MOLTO?

We are three universities and two private companies, from five EU countries. About 25 persons will be actively involved in MOLTO.

• ## What are your backgrounds and competences?

MOLTO has people with backgrounds in computer science, linguistics, and mathematics. There are university professors, PhD students, engineers, and translators.

• ## How will the EU money be used?

The total budget is just below 3,000,000 EUR, of which the EC contribution is 2,375,000 EUR. This will pay 390 person months of work, divided to engineers, PhD students, translators, a project manager, and partial salaries of faculty members. More than 90% of the budget is for salaries and salary-related costs; the rest is mainly for travels, both for internal meetings between the sites, and for participation in conferences to disseminate the results. In another perspective, 86% is for research and development, 10% for dissemination and exploitation, and 5% for management.

• ## Who may exploit the results?

Almost all software will be publicly available as open-source free software released under the GNU LGPL license. The LGPL license implies that anyone may use MOLTO tools for anything, both for research and for commercial purposes. The third-party applications need not be released as open source again, like with the GPL license. But of course we expect much of the derived work also to be released with open source back to the community.

• ## Will there be commercial applications?

Yes, our company partners will evaluate the commercial use during MOLTO.

• ## What are the main ideas behind MOLTO?

The main idea is to use interlinguas based on domain semantics and equipped with reversible generation functions. Thus translation is a composition of parsing the source language and generating the target language. An implementation of this technology is provided by GF, Grammatical Framework, grammaticalframework.org. GF is in MOLTO complemented by the use of ontologies, such as used in the semantic web. We will also use methods of statistical machine translation (SMT) for improving robustness and extracting grammars from data.

• ## How do I acknowledge MOLTO funding?

All publications shall include the following statement to indicate that foreground was generated with the assistance of financial support from the European Union :

The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° FP7-ICT-247914.

• ## What is GF?

GF is a framework for defining multilingual grammars, each based on a common abstract syntax. The abstract syntax is defined by using type theory, in the same way as in logical frameworks. The natural language generation part is called concrete syntax, which is a feature-based grammar formalism equivalent to PMCFG (Parallel Multiple Context-Free Grammars) and has polynomial parsing behaviour. GF uses PMCFG as its "machine language", which is compiled from

• ## Why do you believe these ideas will work?

GF has been developed for 12 years now, and multilingual GF-based translation has been tested in numerous applications, ranging from mathematics via software specifications to spoken dialogue systems (see GF homepage). We also believe there are lots of interesting domain translation tasks out there, even if we cannot provide a competitor to open-domain systems like Google translate.

• ## Isn't interlingua an unrealistic dream?

Yes, it is, if we want to have a universal interlingua working for everything. This is why we don't believe we can ever translate newspapers with MOLTO techniques. However, domain-specific interlinguas have proved quite feasible. Notice that this move is similar to what has happened in ontologies: they have moved from universal ontologies to domain ontologies.

• ## Are there any scientific challenges?

The first challenge is to scale up the size of applications. Not so much the number of languages, which we know how to manage already, but the lexicon size - from hundreds to thousands of words. We need techniques to build manually and extract automatically such translation lexica. This leads to the second challange, which is to minimize the development effort, in terms of skills and time: to make GF available for people with no special training, as a part of their normar work flows.

• ## How will you combine rules and statistics?

This is perhaps the most speculative research topic in MOLTO. We will, first of all, attach to the increasing efforts on hybrid systems, where statistics is used as fall-back of rule-based translation, and there are many yet-to-be-explored technical ideas around this. We will also use statistics to automatically extract translation rules, and to resolve ambiguities. But we want to maintain the control of the quality of the translation; thus we won't blindly return uncertain fall-back translations without warning the user about the uncertainty.

• ## What tools will you build?

The main generic tools are extensions of GF with new user interfaces: a grammar engineer's tool for building systems for new domains, and a translator's tool for using a given translation system. On top of these generic tools, we will build tools tailored to the domains of our case studies. Thus, while the generic translator's tool will be usable in the mathematics domain as well, the users will appreciate its integration with computer algebra systems; the museum object tools will be integrated with existing tools for browsing the

• ## What platforms will the tools run on?

Our code will run on all major operating systems: Linux, Mac OS X, and Windows. So users can download and install MOLTO tools on their own computers. But we will also make them available as web services. The translator's tool, in particular, should be usable within a web browser without any software downloading required. Some kinds of translators, e.g. tourist phrasebooks, will also be natural to run on mobile phones, e.g. on the iPhone and Android platforms. We will provide user interfaces adapted to these platforms, for both on-line and off-line use.

• ## How is a new translation domain defined?

Here is a concrete example of how it can go on. Let's say you want to build a translator for arithmetic propositions. Then you build first of all an abstract syntax, which defines basic concepts such as the set of natural numbers, the properties "even" and "odd", and the relation "greater than"; properties and relations are functions from expressions to propositions. This is how the abstract syntax looks like in GF:

 Nat : Set Even : Exp -> Prop Odd : Exp -> Prop Gt : Exp -> Exp -> Prop Sum : Exp -> Exp 

• ## How do you relate to Google's tools?

You cannot ignore Google when working on machine translation: for most people, it is the state of the art for translation on the web. We see MOLTO translation as an approach diametrically opposed to Google's (precision rather than coverage) and also with different application (producer's rather than consumer's tool). The underlying technology is different: Google translation is based on statistics, MOLTO on grammars. Despite all these differences, hybrid systems might well combine MOLTO with Google translate. In hybrid systems, it is

• ## How will you evaluate your results?

We will collect feedback from our web-based demos. We will also use standard machine translation evaluation tools, BLEU and TAUS, and make comparisons with other translation tools. In addition to translation quality, we will measure the productivity and usability of our tools in user studies. And like many other European projects, we will have a scientific board with independent experts to monitor our progress.