Contract No.: | FP7-ICT-247914 |
---|---|
Project full title: | MOLTO - Multilingual Online Translation |
Deliverable: | D1.1. Work plan for MOLTO |
Security (distribution level): | Confidential |
Contractual date of delivery: | M1 |
Actual date of delivery: | 1 April 2010 |
Type: | Report |
Status & version: | Final (evolving document) |
Author(s): | A. Ranta et al. |
Task responsible: | UGOT |
Other contributors: |
Detailed work plan for internal use of the consortium.
This is an evolving description of the work plan of MOLTO, divided in work packages and in tasks. The document is meant to track what the MOLTO Consortium is planning to do, what it has completed so far and the status of the ongoing research. It is the responsibility of the work package leader to enter tasks and to keep them up to date so as to reflect the work done by the group.
</br/></p/>
Detailed workplan for WP1
A number of management tasks are entitled to the coordinator: e.g.
- collecting information from partners,
- reviewing and submitting information on the progress of the project as well as reports and other deliverables to EC;
- preparation of meetings,
- proposing the decisions and preparing the agenda of the SG,
- chairing of the SG meetings and monitoring the implementation of decisions taken at the meetings;
- presenting the results of the consortium and serving as the secretary in the meetings;
- administering the EC financial contribution and fulfilling other financial tasks,
- maintaining the project's website etc.
According to the Grant Agreement, Annex II, management of the consortium activities includes:
In order to get an overview of the workpackage: - add a view of the associated tasks - add a view of the deliverables
Create an admin type "deliverable" to collect the info on due deliverable so that they can be tracked on the calendar and in the workpackage's description.
The commission requested the following:
Session | Submitted on | Verified on |
---|---|---|
5.2 | Apr 26, 2011 2:40:36 PM | May 3, 2011 2:33:02 PM |
You are kindly requested to clarify the issues raised in this letter and submit a revised periodic report and Forms C through NEF at the latest on 16th of May. Should you require more time, please contact us. However, should we not have heard from you by the deadline we will proceed with the information at hand. Please note that in such case, this may lead to all or part of the costs being rejected.
Please note that according to Article II.5 of the grant agreement, the period for the execution of the payment has been suspended pending receipt of the additional information and the revised Periodic Report through NEF.
Please clarify the following points in the revised periodic report, and revise the Forms C if necessary:
Attachment | Size |
---|---|
MOLTO (247914) _ Periodic Report and Cost Claim submission in NEF.pdf | 77.56 KB |
Cost Claim overview MOLTO (247914) 2011-06-22.pdf | 74.59 KB |
D1 3R (amended).docx | 232.74 KB |
Timetable for negotiation:
The negotiating Project Officer (PO) is Mr. BROCHARD Michel. The full contact references are detailed in the "Negotiation Mandate".
Please note that the negotiation must be successfully concluded by 31/08/2011.
In case this deadline is not met, the Commission reserves the right to cancel negotiations and any subsequent offer for a project grant agreement. We also would like to draw your attention to the fact that negotiations may be terminated, or the negotiation mandate modified, if so required following the results of the consultation with other departments within the Commission.
Please note that in accordance with the legislation in force, the coordinator is obliged to deposit any pre-financing received from the Commission on an interest-bearing bank account. If you do not comply with this obligation, your participation as coordinator may not be accepted.
The negotiation process is supported by an on-line tool called NEF which you will need to use to submit data that is necessary for the grant agreement.
NEF will also provide access to the Legal & Financial Validation form (LFV lite). The LFV lite provides an overview of the status concerning the legal and financial data of the partners of your project, and indicates those partners for whom legal and/or financial data is missing. If the legal and/or financial data of one or more partners is flagged as needed in the LFV lite or would be incorrect, new legal and/or financial documents must be submitted for the partner(s) concerned. Additionally the Commission can also request documents and information in regard to the operational capacity of the consortium and beneficiaries to achieve the objectives and expected results of the project.
The detailed explanations for accessing NEF will be sent shortly in a separate e-mail. Further guidance is available on-line at the following address: http://ec.europa.eu/research/negotiation/
You should have already received the Evaluation Summary Report (ESR) in the info letter email. If not, please contact the negotiating Project Officer.
The negotiation guidance notes and the most recent templates for the Description of Work (Annex I to the Grant Agreement) are available at: Nef Annex 1 - Concept. Other useful information on Framework Programme 7 is available at http://cordis.europa.eu/fp7/find-doc_en.html and includes:
This letter should not be regarded under any circumstances as a formal commitment by the Commission to give financial support as this depends, in particular, on the satisfactory conclusion of negotiations and the completion of the formal selection process. Should you have any queries about the above, please do not hesitate to contact the negotiating Project Officer.
The main issue to solve is the budget cut, which of course is the usual thing to happen. We will get 600k instead of the 712k we applied for. My suggestion is that we cut all WP's and sites in proportion, so we don't need to change the work description too much.
The realistic goal is that the work will begin on 1 September. Even this needs some effort from us:
Attachment | Size |
---|---|
comments_MOLTO_Ext.pdf | 90.52 KB |
Please address the reviewers remarks by the end of September 2011!!!!
Soon it is time for the reporting of period 2 (01/03/2011 – 29/02/201)of the project MOLTO.
You have to send me:
This year you can complete the Use of resources directly in the NEF when completing the Form C. You will have to write short explanations of the costs: the number of person months, travel costs (who travelled where and for which purpose/meeting), consumables etc. All the costs must be related to a Work package.
The deadline for submitting your financial statement in the Participant Portal as well as sending me the Use of Resources by e-mail is 1st of April 2011.
The signed Financial Statement and the CFS (if applicable) have to be submitted to me in paper copies. Please send the originals by courier to address below.
To access the project via the Participant Portal, click on the following link: http://ec.europa.eu/research/participants/portal/
To log into the Participant Portal you need to have an account. If you don't have an account yet follow the 'register' link and instructions on the Participant Portal main page.
Once logged in with the account associated with your email address, the list of the projects you are involved in will appear under the 'My Projects' tab. The project MOLTO (247914) will appear under tab “Active”. By selecting “FR” on that line you will gain access to the Form C.
Do not hesitate to contact me if you have any questions. Kristina
Kristina Orbán Meunier
UNIVERSITY OF GOTHENBURG Research and Innovation Services
Erik Dahlbergsgatan 11B Box 100, 405 30 Göteborg, Sweden Tel +46 31 786 6466
mobile +46 766 229466
The grammar developer's tools are divided to two kinds of tasks:
GF grammar compiler API
actual tools implemented by using the API
The workplan for the first six months concerns mostly the API, with the main actual tool being the GF Shell, which is a line-based grammar development tool. It is a powerful tool since it enables scripting, but it is not integrated with other working environments. The most important other environment will be web-based access to the grammar compiler.
Note that most discussions on GF are public at http://code.google.com/p/grammatical-framework/.
Here follows the work plan, with tasks assigned to sites and approximate months.
Documentation of GF is hosted on Google Code at http://code.google.com/p/grammatical-framework/
There is a wiki cover page for the Resource Grammar Library API and an online version at http://www.grammaticalframework.org/compiler-api/.
The GF API design will take into account the following requirements:
The documentation is being hosted at the GF website.
What we mean by example based grammar writing.
Current status is proof of concept: it is possible to load example based grammar and to compile it.
Need to do: - ....
The runtime is the part of the GF system that implements parsing and linearization of texts based on a PGF grammar that has been produced by the GF compiler.
The standard GF runtime is written in Haskell like the rest of the system. Unfortunately this results in a large memory footprint and possibly also portability problems, which preclude its use in certain applications.
The goal of the current task is to reimplement the GF runtime as a pure C library. This C library can then hopefully be used in some situations where the Haskell-based runtime would be unwieldy.
Preview versions of the implementation, libpgf
, are available from the project home page. This is also where up-to-date documentation can be found.
The compiler API must be used by the morphology server.
To develop a python plugin for gf (based on the planned C plugin) and connect it to relevant parts of the Natural Language Toolkit (http://www.nltk.org/)
2.8.1 Develop python bindings to gf.
2.8.2 nltk integration.
This is how to use some of the functionalities of the GF shell inside Python.
Due to some ghc glitch, it only builds on Linux.
You'll need the source distribution of GF, ghc and the Python development files1. Then, go to the python bindings folder and build it:
cd GF/contrib/py-bindings
make
It will build a shared library (gf.so
) that you can import and use into Python as shown below.
To test if it works correctly, type:
python -m doctest example.rst
First you must import the library:
% import gf
then load a PGF file, like this tiny example:
% pgf = gf.read_pgf("Query.pgf")
We could ask for the supported languages:
% pgf.languages()
[QueryEng, QuerySpa]
The start category of the PGF module is:
% pgf.startcat()
Question
Let's us save the languages for later:
% eng,spa = pgf.languages()
These are opaque objects, not strings:
% type(eng)
(type 'gf.lang')
and must be used when parsing:
% pgf.parse(eng, "is 42 prime")
[Prime (Number 42)]
Yes, I know it should have a '?' at the end, but there is not support for other lexers at this time.
Notice that parsing returns a list of gf trees. Let's save it and linearize it in Spanish:
% t = pgf.parse(eng, "is 42 prime")
% pgf.linearize(spa, t[0])
'42 es primo'
(which is not, but there is a '?' lacking at the end, remember?)
One of the good things of the GF shell is that it suggests you which tokens can continue the line you are composing.
We got this also in the bindings. Suppose we have no idea on how to start:
% pgf.complete(eng, "")
['is']
so, there is only a sensible thing to put in. Let's continue:
% pgf.complete(eng, "is ")
[]
Is it important to note the blank space at the end, otherwise we get it again:
% pgf.complete(eng, "is")
['is']
But, how come that nothing is suggested at "is "? At the current point, a literal integer is expected, so GF would have to present an infinite list of alternatives. I cannot blame it for refusing to do so.
% pgf.complete(eng, "is 42 ")
['even', 'odd', 'prime']
Good. I will go for 'even', just to be in the safe side:
% pgf.complete(eng, "is 42 even ")
[]
Nothing again, but this time the phrase is complete. Let us check it by parsing:
% pgf.parse(eng, "is 42 even")
[Even (Number 42)]
We store the last result and ask for its type:
% t = pgf.parse(eng, "is 42 even")[0]
% type(t)
(type 'gf.tree')
What's inside this tree? We use unapply
for that:
% t.unapply()
[Even, Number 42]
This method returns a list with the head of the fun judgement and its arguments:
% map(type, _)
[(type 'gf.cid'), (type 'gf.expr')]
Notice the argument is again a tree (gf.tree
or gf.expr
, it is all
the same here.)
% t.unapply()[1]
Number 42
We will repeat the trick with it now:
% t.unapply()[1].unapply()
[Number, 42]
and again, the same structure shows up:
% map(type, _)
[(type 'gf.cid'), (type 'gf.expr')]
One more time, just to get to the bottom of it:
% t.unapply()[1].unapply()[1].unapply()
42
but now it is an actual number:
% type(_)
(type 'int')
We ended with a full decomposed fun judgement.
In Ubuntu I got it by installing the package python-all-dev
. ↩
Here a slighly better description with eventually relevant links to sw, documentation etc.
Major features:
New languages:
Web-based tools for grammarians: http://www.grammaticalframework.org/demos/gfse/
Ongoing work at http://cloud.grammaticalframework.org.
Look into online IDE platforms, like Kodingen and CodeRun.
There is work for Ajax-based code editors, eg Ymacs, which could be useful since there is a GF mode for emacs already (where?).
The emacs mode can now be found in http://www.grammaticalframework.org/src/tools/gf.el (note by Aarne)
There is also a Mozilla project, Bespin, to build a web-based editor extensible by javascript.
Also - check Orc, yet another online IDE for a new language, using CodeMirror as editor.
Design and intergrate probabilistic features to GF and PGF.
Extend planning here.
Finale phase of the work planned in this workpackage. Exact scheduling to be defined.
Adding the possibility to dynamically add new words to lexicons "linked" in compiled grammars.
To be entered for M7 - M30.
Add child pages to the living deliverable following instructions given in the abstract.
http://www.molto-project.eu/wiki/living-deliverables/d43a-appendix-gramm...
See deliverable
According to the plan http://www.molto-project.eu/node/858 the Knowledge Engineering Infrastructure has been realeased. It is accessible here. We have imported an exemplary initial data set containing information for different persons, organizations, locations.
To execute a SPARQL query to the data set, click "SPARQL Query" and for exemple try the following query without the backslashes (\)
prefix rdf:<\http://www.w3.org/1999/02/22-rdf-syntax-ns#> prefix rdfs:<\http://www.w3.org/2000/01/rdf-schema#> prefix prt:<\http://proton.semanticweb.org/2006/05/protont#> select distinct ?l where { ?s rdf:type prt:Organization ; rdfs:label ?l . }
It should return the names of all organizations stored in the data set.
The Knowledge Engineering Infrastructure could be extended with new data sets if new data sets are available, see http://www.molto-project.eu/node/858, http://www.molto-project.eu/node/896 and http://www.molto-project.eu/node/948.
here a better task description
Mathematical grammars developed using GF for the WebALT project (eContent 22253) allow us to generate multilingual simple drills for high school students and university freshmen. These grammars will be the starting point aiming at extending coverage to word problems, the ones that require the student to first model a situation and then to manipulate the mathematical model to obtain a solution.
The UPC team, being a main actor in the past developing of gf mathematical grammars and having ample experience in mathematics teaching, will be in charge of the tasks in this work package with help from UGOT on technical aspects of GF and possibly Ontotext on ontology representation and handling.
It will be required to reason on equations and statements proposed by the student, so we will need to review to what extend an automatic reasoner could deal with student input of this sort and how the system behavior could be designed to degrade gracefully in order to keep the student interaction going.
In the framework of the WebALT project a gf grammar library was developed for generating simple mathematical drills in a variety of languages. The legal status of this library has recently changed to LGPL, making it suitable to be the starting point for the language services demanded by this work package. To achieve a better degree of interchangeability it is required to organize the existing code into modules, remove redundancies and lay them in a way acceptable for easy lexicon enhancement by way of the grammar developer’s tools of work package 2, WP2.
Writing a gf grammar for commanding a generic computer algebra system (CAS) by natural language imperative sentences. Concrete grammars adapted to the CAS at hand. Depends on work package 2 WP2.
Integrate the commanding library into a component to transform the issued commands to the CAS.
Gf grammar library able to generate natural language sentences corresponding to objects and relations of the word problem. It must be able to parse simple questions related to the word problem domain into predicates. Depends on work package 2 and probably work package 4.
Automated reasoning is needed to assess the soundness of the model proposed by the student and to answer his/her questions. This requires adding small ontologies describing the word problem, including:
Add State of the Art study here.
Some time ago I managed to build a theory supporting the Farm problem in Isabelle/HOL (attached below)
I wasn't expecting such a toil but lack of detailed documentation and a wicked simplifier made my life miserable for a whole week.
It is based on 3 sets:
and a function: is_leg_of : leg → animal.
As axioms, we have:
That is, facts that are implicitely known but you need to state for Isabelle with Main
theory to work:
Let R
be the number of rabbits in the farm and D
the number of ducks in the farm. With the preceding axioms, we were able to produce Isabelle-certified proofs that
R + D = 100
and
2*D + 4*R = 260
and then deduce that R=30
and D=70
.
Attachment | Size |
---|---|
Farm.thy | 5.67 KB |
In particular, objects will be annotated by natural language noun-phrases and equations by sentences. These annotations will be parsed into GF interlingua and will be used whenever language generation related to the problem was needed.
The work will start with the provision of user requirements (WP9) and the preparation of a parallel patent corpus (EPO) to fuel the training of statistical MT (UPC). In parallel UGOT will work on grammars covering the domain and subsequently, together with UPC, apply the hybrid (WP2, WP5) MT on abstracts and claims. Ontotext will provide semantic infrastructure with loaded existing structured data sets (WP4) from the patent domain (IPC, patent ontology, bio-medical and pharmaceutical knowledge bases, e.g. LLD). Based on the use case requirements, Ontotext will build a prototype (D7.1, D7.2) exposing multiple cross-lingual retrieval paradigms and MT of patent sections. The accuracy will be regularly evaluated through both automatic (e.g. BLEU scoring) and human based (e.g. TAUS) means (WP9).
The work package is split into 9 major tasks as follows:
The patents case study comprises two basic scenarios: the online patent retrieval and the patent translation. In this prototype we tackle these two scenarios separately, as shown in Figure 1, even though they can be viewed as a unique multilingual patent retrieval paradigm. In the future, we plan to study how to automate the reciprocal inputs between the two processes, i.e. the annotation of translations and the translation of semantically annotated documents.
From a general perspective, two user roles may be defined in this case study: end-users looking for information related to the patents and editors adding new patent documents to a hypothetical repository.
Details are given in D71.
Determining and gathering of bilingual and monolingual corpora for the patent case study.
There are two subtasks here:
Developing an ontology capturing the structure of patent documents; and indexing the patents documents according to the semantic knowledge.
Contact @UPC: Lluis and Cristina
DEPENDENCIES:
Participants:
Contact point @Ontotext: Borislav Popov
DEADLINES: Beta = M21; Final = M27
Contact @UPC: Lluis and Cristina
DEPENDENCIES:
Patents abstracts and claim are translated using the baseline of the hybrid system.
DEPENDENCIES:
Participants:
Contact point @Ontotext: Borislav Popov
DEADLINES: Beta = M21; Final = M27
DEPENDENCIES:
Note: Deadlines have been delayed 3 months due to the WP delay.
DEADLINE: M31 (to allow for final report)
The work is started by a study of the existing categorizations and metadata schemas adopted by the museum, as well as a corpus of texts in the current documentation which describe these objects (D8.1, UGOT and Ontotext). We will transform the CIDOC-CRM model into an ontology aligning it with the upper-level one in the base knowledge set (WP4) and modeling the museum object metadata as a domain specific knowledge base. Through the interoperability engine from WP4 and the IDE from WP2, we will semi-automatically create the translation grammar and further extend it (D8.2, UGOT, UHEL, UPC, Ontotext). The final result will be an online system enabling museum (virtual) visitors to use their language of preference to search for artefacts through semantic (structured) and natural language queries and examine information about them. We will also automatically generate a set of articles in the Wikipedia format describing museum artefacts in the 5 languages with extensive grammar coverage (D8.3, UGOT, Ontotext).
Links to Swedish museum databases who use the Carlotta system which is built upon the CIDOC-CRM model:
The work will start with collecting user requirements for the grammar development IDE (WP2), translation tools (WP3), and the use cases (WP6-8).
We will define the evaluation criteria and schedule in synchrony with the WP plans (D9.1). We will define and collect corpora including diagnostic and evaluation sets, the former, to improve translation quality on the way, and the latter to evaluate final results.
Translator's new role (parallel to WP3: Translator's tools) will be designed and described in the D9.1 deliverable. Most current translator's workbench software treat the original text as read-only source. The tools to be developed within WP3 (+ 2) will lead towards more mutable role of source text. The translation process will resemble more like structured document editing or multilingual authoring than transformation from a fixed source to a number of target languages.
We will only provide a basic infrastructure API for external translation workbenches and keep an eye on the "new multilingual translator's workflow".
For each work package, the liaison contact information and work progress will be kept up-to-date on the MOLTO web site. Our liaison person Mirka Hyvärinen will be in contact with other project members.
Also possibility to access UHEL's internal working wiki "MOLTO kitwiki" will be granted upon request to other project members.
Evaluation aims at both quality and usability aspects. UHEL will develop usability tests for the end-user human translator. The MOLTO-based translation workflow may differ from the traditional translator's workflow. This will be discussed in the D9.1 evaluation plan.
To measure the quality of MOLTO translations, we compare them to (i) statistical and symbolic machine translation (Google, SYSTRAN); and (ii) human professional translation. We will use both automatic metrics (IQmt and BLEU; see section 1.2.8 for details) and TAUS quality criteria (Translation Automation Users Society). As MOLTO is focused on information-faithful grammatically correct translation in special domains, TAUS results will probably be more important.
Given MOLTO's symbolic, grammar-based interlingual approach, scalability, portability and usability are important quality criteria for the translation results. For the translator's tools, user-friendliness will be a major aspect of the evaluation. These criteria are quantified in (D9.1) and reported in the final evaluation (D9.2).
In addition to the WP deliverables, there will be continuous evaluation and monitoring with internal status reports according to the schedule defined in D9.1.
Define workplan here
Factorize the grammar used now for the demo fridge in modules that isolate the different kinds of phrases: eg. Comments, Greetings, Questions, etc. Check whether there are ontologies that describe these.
The factorization can be seen in the phrasebook example under /example/phrasebook.
The MOLTO Phrasebook is a web application for the traveler, eventually it will be a phone application (for the Android). It consists of frequently used phrases that a foreigner might want to use when abroad.
demo preview: http://tournesol.cs.chalmers.se/~aarne/phrasebook/phrasebook.html
The current GF Grammar Compiler API is providing translation services that can be called on-the-fly. The goal of this task is to find out how to integrate them to an existing API where there is a need for Internationalization, example Facebook https://developers.facebook.com/docs/internationalization.
The image shows how translations are entered manually in the current version. My guess is that we could improve on that.
Anther example is the situation of commonly used sentences: "Happy birthday", we have on our Travel Phrasebook, we do not have Portuguese, we could friends-source it :) but how? Give them a FB app?
Love to see some comments on this.
BTW, I am not partial to FB, you can check any social network of your liking that provides an Internationalization API. This is a test of concept also looking for CNLs in the wild :)
The core of WP11 is an existing wiki system AceWiki which is going to be developed into a multilingual controlled natural language wiki system within the MOLTO project.
The AceWiki homepage (http://attempto.ifi.uzh.ch/acewiki/) contains:
AceWiki development is hosted on GitHub (https://github.com/AceWiki/AceWiki)
AceWiki side:
GF side:
Release notes: https://raw.github.com/AceWiki/AceWiki/master/CHANGES.txt
See also https://github.com/yuchangyuan/AceWiki
See also the thread starting with: https://lists.ifi.uzh.ch/pipermail/attempto/2011-December/000818.html
General refactoring and clean-up of the AceWiki code.
Make the AceWiki design multilingual and implement a small AceWiki engine for multilingual GF grammars.