D1.1 Work plan for MOLTO


Contract No.: FP7-ICT-247914
Project full title: MOLTO - Multilingual Online Translation
Deliverable: D1.1. Work plan for MOLTO
Security (distribution level): Confidential
Contractual date of delivery: M1
Actual date of delivery: 1 April 2010
Type: Report
Status & version: Final (evolving document)
Author(s): A. Ranta et al.
Task responsible: UGOT
Other contributors:



Abstract

Detailed work plan for internal use of the consortium.

This is an evolving description of the work plan of MOLTO, divided in work packages and in tasks. The document is meant to track what the MOLTO Consortium is planning to do, what it has completed so far and the status of the ongoing research. It is the responsibility of the work package leader to enter tasks and to keep them up to date so as to reflect the work done by the group.

</br/></p/>

WP1: Management

Detailed workplan for WP1

A number of management tasks are entitled to the coordinator: e.g. - collecting information from partners,
- reviewing and submitting information on the progress of the project as well as reports and other deliverables to EC; - preparation of meetings, - proposing the decisions and preparing the agenda of the SG, - chairing of the SG meetings and monitoring the implementation of decisions taken at the meetings; - presenting the results of the consortium and serving as the secretary in the meetings; - administering the EC financial contribution and fulfilling other financial tasks, - maintaining the project's website etc.

According to the Grant Agreement, Annex II, management of the consortium activities includes:

  • maintenance of the consortium agreement, if it is obligatory,
  • the overall legal, ethical, financial and administrative management including, for each of the beneficiaries, the obtaining of the certificates on the financial statements and on the methodology and costs relating to financial audits and technical reviews,
  • implementation of competitive calls by the consortium for the participation of new beneficiaries, where required by Annex I of this grant agreement,
  • any other management activities foreseen by the annexes, except coordination of research and technological development activities.

Associate tasks to workpackages

15 Mar 2010
Europe/Stockholm
ID: 
1.1
Workpackage: 
Management
Task leader: 
aarne.ranta
Assignees: 
olga.caprotti
Status: 
Completed
Timeframe: 
Mar 2010 - Apr 2010
Completed on: 
25 March, 2010 - 23:00

In order to get an overview of the workpackage: - add a view of the associated tasks - add a view of the deliverables

Create content type "deliverable"

15 Mar 2010
Europe/Stockholm
ID: 
1.2
Workpackage: 
Management
Task leader: 
aarne.ranta
Assignees: 
olga.caprotti
Status: 
Completed

Create an admin type "deliverable" to collect the info on due deliverable so that they can be tracked on the calendar and in the workpackage's description.

See list of all deliverables

Revision of the management report

0
ID: 
1.3
Workpackage: 
Management
Task leader: 
aarne.ranta
Assignees: 
aarne.ranta
Assignees: 
emilia.rung
Assignees: 
olga.caprotti
Relevant Deliverables: 
Periodic management report 1
Status: 
Completed
Timeframe: 
May 2011 - Jul 2011
Completed on: 
21 June, 2011 - 18:00

The commission requested the following:

Session Submitted on Verified on
5.2 Apr 26, 2011 2:40:36 PM May 3, 2011 2:33:02 PM

You are kindly requested to clarify the issues raised in this letter and submit a revised periodic report and Forms C through NEF at the latest on 16th of May. Should you require more time, please contact us. However, should we not have heard from you by the deadline we will proceed with the information at hand. Please note that in such case, this may lead to all or part of the costs being rejected.

Please note that according to Article II.5 of the grant agreement, the period for the execution of the payment has been suspended pending receipt of the additional information and the revised Periodic Report through NEF.

Please clarify the following points in the revised periodic report, and revise the Forms C if necessary:

  • Reports – general: Please add a list of conferences and meetings you attended, including who participated, venue and for what purpose.
  • Please provide in more detail the activities carried out by each partner according to the template which can be found on EC/Cordis home page (http://cordis.europa.eu/fp7/find-doc_en.html).
  • Beneficiary 1 – There is a discrepancy between average personnel cost compared to the budget. Could you please clarify why average personnel costs are more then 15 % higher then budget?
  • Beneficiary 2 – There is a marked difference between the use of MM (compared to budget) and funding for personnel in the 1st period. The average personnel cost is lower than budgeted for. Could you please explain these discrepancies?
  • Beneficiary 3 – Please justify coffee breaks under subcontracting considering that you anticipated as subcontracting only the auditing costs (p. 45 of the Annex I of the Grant Agreement).
  • Beneficiary 4 - Please specify more accurately use of MM per WP. From your explanation it is not clear enough how many MM you used. Could you please correct the table in the Management of the use of resources part of the report where the total costs seem to appear under indirect costs while the item indirect costs differs from that reported in the form C.
AttachmentSize
MOLTO (247914) _ Periodic Report and Cost Claim submission in NEF.pdf77.56 KB
Cost Claim overview MOLTO (247914) 2011-06-22.pdf74.59 KB
D1 3R (amended).docx232.74 KB

MOLTO-enlarged negotiation

0
ID: 
1.4
Workpackage: 
Management
Task leader: 
aarne.ranta
Assignees: 
olga.caprotti
Status: 
Ongoing
Timeframe: 
Jul 2011 - Aug 2011

Negotiation Mandate

  1. Proposal number:288317 Acronym:MOLTO-Enlarged EU
  2. Strategic objective/theme:
  3. Project Officer (to whom all documents must be returned): MR. BROCHARD Michel Tel:33912 European Commission e-Mail:Michel.Brochard@ec.europa.eu DG INFSO - E 01 Office: EUFO - 02/270 L - 2920 Luxembourg
  4. Date and time of first negotiation meeting: 13/07/2011 at 10:00AM Address for the first negotiation meeting: 10 rue de R. StumperL - 2920 Luxembourg
  5. EC financial contribution: Maximum financial EC contribution 600.000,00 € (euro)
  6. Duration of the project: 18 Months
  7. Change of technical content: Forthcoming
  8. Timetable for negotiation:
    
    • 05/08/2011 Deadline for the first version of the description of work(Annex I) and GPF
    • 13/07/2011 Negotiation meeting in Luxembourg
    • 31/08/2011 End of negotiation

Accompanying letter

The negotiating Project Officer (PO) is Mr. BROCHARD Michel. The full contact references are detailed in the "Negotiation Mandate".

Please note that the negotiation must be successfully concluded by 31/08/2011.

In case this deadline is not met, the Commission reserves the right to cancel negotiations and any subsequent offer for a project grant agreement. We also would like to draw your attention to the fact that negotiations may be terminated, or the negotiation mandate modified, if so required following the results of the consultation with other departments within the Commission.

Please note that in accordance with the legislation in force, the coordinator is obliged to deposit any pre-financing received from the Commission on an interest-bearing bank account. If you do not comply with this obligation, your participation as coordinator may not be accepted.

The negotiation process is supported by an on-line tool called NEF which you will need to use to submit data that is necessary for the grant agreement.

NEF will also provide access to the Legal & Financial Validation form (LFV lite). The LFV lite provides an overview of the status concerning the legal and financial data of the partners of your project, and indicates those partners for whom legal and/or financial data is missing. If the legal and/or financial data of one or more partners is flagged as needed in the LFV lite or would be incorrect, new legal and/or financial documents must be submitted for the partner(s) concerned. Additionally the Commission can also request documents and information in regard to the operational capacity of the consortium and beneficiaries to achieve the objectives and expected results of the project.

The detailed explanations for accessing NEF will be sent shortly in a separate e-mail. Further guidance is available on-line at the following address: http://ec.europa.eu/research/negotiation/

You should have already received the Evaluation Summary Report (ESR) in the info letter email. If not, please contact the negotiating Project Officer.

The negotiation guidance notes and the most recent templates for the Description of Work (Annex I to the Grant Agreement) are available at: Nef Annex 1 - Concept. Other useful information on Framework Programme 7 is available at http://cordis.europa.eu/fp7/find-doc_en.html and includes:

  • documents referring to the negotiation guidance notes,
  • the model Grant Agreement and special conditions,
  • the guide to financial issues,
  • the checklist for a consortium agreement for FP7 projects, and
  • the guide to intellectual property rules for FP7.

This letter should not be regarded under any circumstances as a formal commitment by the Commission to give financial support as this depends, in particular, on the satisfactory conclusion of negotiations and the completion of the formal selection process. Should you have any queries about the above, please do not hesitate to contact the negotiating Project Officer.


Aarne's task list

The main issue to solve is the budget cut, which of course is the usual thing to happen. We will get 600k instead of the 712k we applied for. My suggestion is that we cut all WP's and sites in proportion, so we don't need to change the work description too much.

The realistic goal is that the work will begin on 1 September. Even this needs some effort from us:

  • nominate one contact person (for UZH and BI) who is available at least periodically through the negotiations and authorized (by the rest of the site) to make decisions
  • finalize the work description
  • adjust the budget
  • provide the list of persons (mainly an issue for UZH and BI) as well as relations to other ongoing EC projects
  • extend the MOLTO consortium agreement and get it signed
  • we should work as if the deadline was 26 August; if we seem to fail, we can probably ask the EC for extension
AttachmentSize
comments_MOLTO_Ext.pdf90.52 KB

Followup on review requests

0
ID: 
1.5
Workpackage: 
Management
Task leader: 
olga.caprotti
Assignees: 
aarne.ranta
Assignees: 
borislav.popov
Assignees: 
jordi.saludes
Assignees: 
lauri.carlson
Status: 
Assigned
Timeframe: 
Sep 2011

Please address the reviewers remarks by the end of September 2011!!!!

MOLTO - financial reporting period 2

0
ID: 
1.6
Workpackage: 
Management
Task leader: 
aarne.ranta
Assignees: 
kristina.orban....
Status: 
Ongoing
Timeframe: 
Mar 2012 - Apr 2012

Soon it is time for the reporting of period 2 (01/03/2011 – 29/02/201)of the project MOLTO.

You have to send me:

  • two copies of the Form C (Financial Statement). The (Form C) is to be submitted both electronically (via the Participant Portal) and as a paper copy. After that I have submitted the Form C to the European Commission you can print it out and sign it. I’ll inform you when the Forms C are ready for signatures.

This year you can complete the Use of resources directly in the NEF when completing the Form C. You will have to write short explanations of the costs: the number of person months, travel costs (who travelled where and for which purpose/meeting), consumables etc. All the costs must be related to a Work package.

  • A Certificate on Financial Statement (CFS) is necessary if your claimed EC contribution is equal to or more than 375 000€. Observe that although the threshold is established on the basis of the EC contribution, the CFS must certify all eligible costs.

The deadline for submitting your financial statement in the Participant Portal as well as sending me the Use of Resources by e-mail is 1st of April 2011.

The signed Financial Statement and the CFS (if applicable) have to be submitted to me in paper copies. Please send the originals by courier to address below.

To access the project via the Participant Portal, click on the following link: http://ec.europa.eu/research/participants/portal/

To log into the Participant Portal you need to have an account. If you don't have an account yet follow the 'register' link and instructions on the Participant Portal main page.

Once logged in with the account associated with your email address, the list of the projects you are involved in will appear under the 'My Projects' tab. The project MOLTO (247914) will appear under tab “Active”. By selecting “FR” on that line you will gain access to the Form C.

Do not hesitate to contact me if you have any questions. Kristina

Kristina Orbán Meunier

UNIVERSITY OF GOTHENBURG Research and Innovation Services

Erik Dahlbergsgatan 11B Box 100, 405 30 Göteborg, Sweden Tel +46 31 786 6466

mobile +46 766 229466

kristina.orban.meunier@gu.se

www.gu.se/researchinnovation

WP2: Grammar Developer’s Tools

The grammar developer's tools are divided to two kinds of tasks:

  • GF grammar compiler API

  • actual tools implemented by using the API

The workplan for the first six months concerns mostly the API, with the main actual tool being the GF Shell, which is a line-based grammar development tool. It is a powerful tool since it enables scripting, but it is not integrated with other working environments. The most important other environment will be web-based access to the grammar compiler.

Note that most discussions on GF are public at http://code.google.com/p/grammatical-framework/.

Here follows the work plan, with tasks assigned to sites and approximate months.

Improving the Resource Grammar Library API and its documentation

0
ID: 
2.10
Task leader: 
aarne.ranta
Assignees: 
ramona.enache
Relevant Deliverables: 
GF Grammar Compiler API
Status: 
Ongoing
Timeframe: 
Mar 2010

Documentation of GF is hosted on Google Code at http://code.google.com/p/grammatical-framework/

There is a wiki cover page for the Resource Grammar Library API and an online version at http://www.grammaticalframework.org/compiler-api/.

Designing the API and writing its documentation

0
ID: 
2.5
Task leader: 
aarne.ranta
Assignees: 
krasimir.angelov
Status: 
Completed
Timeframe: 
Mar 2010 - Aug 2010
Completed on: 
1 October, 2011 (All day)

The GF API design will take into account the following requirements:

  • programming environment eg Eclipse, XCode, NotePad++, Web etc
  • standard formats for I/O
  • ....

The documentation is being hosted at the GF website.

Example-based grammar writing

0
ID: 
2.2
Task leader: 
ramona.enache
Assignees: 
aarne.ranta
Relevant Deliverables: 
Grammar IDE
Status: 
Ongoing
Timeframe: 
Jul 2010 - Sep 2010

What we mean by example based grammar writing.

Current status is proof of concept: it is possible to load example based grammar and to compile it.

Need to do: - ....

GF runtime in C

0
ID: 
2.11
Task leader: 
lauri.alanko
Assignees: 
jordi.saludes
Assignees: 
krasimir.angelov
Assignees: 
lauri.alanko
Status: 
Ongoing
Timeframe: 
Apr 2010

Overview

The runtime is the part of the GF system that implements parsing and linearization of texts based on a PGF grammar that has been produced by the GF compiler.

The standard GF runtime is written in Haskell like the rest of the system. Unfortunately this results in a large memory footprint and possibly also portability problems, which preclude its use in certain applications.

The goal of the current task is to reimplement the GF runtime as a pure C library. This C library can then hopefully be used in some situations where the Haskell-based runtime would be unwieldy.

Status

Preview versions of the implementation, libpgf, are available from the project home page. This is also where up-to-date documentation can be found.

Morphology server and its API

0
ID: 
2.3
Task leader: 
aarne.ranta
Assignees: 
aarne.ranta
Assignees: 
krasimir.angelov
Relevant Deliverables: 
Grammar IDE
Status: 
Planned
Timeframe: 
Aug 2010 - Oct 2010

The compiler API must be used by the morphology server.

Plugin to Python NLTK

0
ID: 
2.8
Task leader: 
jordi.saludes
Assignees: 
jordi.saludes
Status: 
Completed

To develop a python plugin for gf (based on the planned C plugin) and connect it to relevant parts of the Natural Language Toolkit (http://www.nltk.org/)

Subtasks

2.8.1 Develop python bindings to gf.

2.8.2 nltk integration.

GF python bindings

Using the GF python bindings

This is how to use some of the functionalities of the GF shell inside Python.

Installation

Due to some ghc glitch, it only builds on Linux.

You'll need the source distribution of GF, ghc and the Python development files1. Then, go to the python bindings folder and build it:

 cd GF/contrib/py-bindings
 make

It will build a shared library (gf.so) that you can import and use into Python as shown below.

Testing installation

To test if it works correctly, type:

 python -m doctest example.rst

Examples

Loading a pgf file

First you must import the library:

% import gf

then load a PGF file, like this tiny example:

% pgf = gf.read_pgf("Query.pgf")

We could ask for the supported languages:

% pgf.languages()
[QueryEng, QuerySpa]

The start category of the PGF module is:

% pgf.startcat()
Question

Parsing and linearizing

Let's us save the languages for later:

% eng,spa = pgf.languages()

These are opaque objects, not strings:

% type(eng) 
(type 'gf.lang')

and must be used when parsing:

% pgf.parse(eng, "is 42 prime") 
[Prime (Number 42)]

Yes, I know it should have a '?' at the end, but there is not support for other lexers at this time.

Notice that parsing returns a list of gf trees. Let's save it and linearize it in Spanish:

% t = pgf.parse(eng, "is 42 prime")
% pgf.linearize(spa, t[0])
'42 es primo'

(which is not, but there is a '?' lacking at the end, remember?)

Getting parsing completions

One of the good things of the GF shell is that it suggests you which tokens can continue the line you are composing.

We got this also in the bindings. Suppose we have no idea on how to start:

% pgf.complete(eng, "")
['is']

so, there is only a sensible thing to put in. Let's continue:

% pgf.complete(eng, "is ")
[]

Is it important to note the blank space at the end, otherwise we get it again:

% pgf.complete(eng, "is")
['is']

But, how come that nothing is suggested at "is "? At the current point, a literal integer is expected, so GF would have to present an infinite list of alternatives. I cannot blame it for refusing to do so.

% pgf.complete(eng, "is 42 ")
['even', 'odd', 'prime']

Good. I will go for 'even', just to be in the safe side:

% pgf.complete(eng, "is 42 even ")
[]

Nothing again, but this time the phrase is complete. Let us check it by parsing:

% pgf.parse(eng, "is 42 even")
[Even (Number 42)]

Deconstructing gf trees

We store the last result and ask for its type:

% t = pgf.parse(eng, "is 42 even")[0]
% type(t)
(type 'gf.tree')

What's inside this tree? We use unapply for that:

% t.unapply()
[Even, Number 42]

This method returns a list with the head of the fun judgement and its arguments:

% map(type, _)
[(type 'gf.cid'), (type 'gf.expr')]

Notice the argument is again a tree (gf.tree or gf.expr, it is all the same here.)

% t.unapply()[1]
Number 42

We will repeat the trick with it now:

% t.unapply()[1].unapply()
[Number, 42]

and again, the same structure shows up:

% map(type, _)
[(type 'gf.cid'), (type 'gf.expr')]

One more time, just to get to the bottom of it:

% t.unapply()[1].unapply()[1].unapply()
42

but now it is an actual number:

% type(_)
(type 'int')

We ended with a full decomposed fun judgement.


  1. In Ubuntu I got it by installing the package python-all-dev

Refactoring the grammar compiler code base (to improve reusability)

0
ID: 
2.1
Task leader: 
aarne.ranta
Assignees: 
krasimir.angelov
Status: 
Assigned
Timeframe: 
Mar 2010 - Jul 2010

Here a slighly better description with eventually relevant links to sw, documentation etc.

Release of GF 3.2

0
ID: 
2.4
Task leader: 
aarne.ranta
Assignees: 
aarne.ranta
Assignees: 
krasimir.angelov
Status: 
Completed
Timeframe: 
Mar 2010 - Dec 2010
Completed on: 
15 December, 2012 (All day)

Major features:

  • pgf format is updated and documented in the wiki
  • completed runtime type checker for dependent types
  • parsing with dependent types
  • on-line parsing
  • type-error reporting
  • exhaustive generation of ASTs (also via lambda prolog)
  • probabilities in the abstract syntax
  • random generation guided by probability
  • parse results ranked by probability
  • example based grammar generation (extra script)

New languages:

  • Urdu, complete resource grammar
  • Turkish, complete morphology
  • Amharic, complete resource grammar
  • Punjabi, complete morphology

Web-based grammar development environment (version 1)

0
ID: 
2.6
Task leader: 
aarne.ranta
Assignees: 
krasimir.angelov
Assignees: 
thomas.hallgren
Relevant Deliverables: 
Grammar IDE
Dependencies: 
Release of GF 3.2
Status: 
Ongoing
Timeframe: 
Aug 2010 - Mar 2011

Prototype

Web-based tools for grammarians: http://www.grammaticalframework.org/demos/gfse/

Ongoing work at http://cloud.grammaticalframework.org.

Similar work

Look into online IDE platforms, like Kodingen and CodeRun.

There is work for Ajax-based code editors, eg Ymacs, which could be useful since there is a GF mode for emacs already (where?).

The emacs mode can now be found in http://www.grammaticalframework.org/src/tools/gf.el (note by Aarne)

There is also a Mozilla project, Bespin, to build a web-based editor extensible by javascript.

Also - check Orc, yet another online IDE for a new language, using CodeMirror as editor.

Integrating probabilities in GF and PGF

0
ID: 
2.8
Task leader: 
aarne.ranta
Assignees: 
aarne.ranta
Status: 
Planned
Timeframe: 
Oct 2010 - Dec 2010

Design and intergrate probabilistic features to GF and PGF.

Extend planning here.

Integration with ontology tools

0
ID: 
2.9
Task leader: 
aarne.ranta
Assignees: 
aarne.ranta
Assignees: 
borislav.popov
Assignees: 
lauri.carlson
Status: 
Planned

Finale phase of the work planned in this workpackage. Exact scheduling to be defined.

On-line extension of PGF with new words

0
ID: 
2.7
Task leader: 
aarne.ranta
Assignees: 
krasimir.angelov
Status: 
Planned
Timeframe: 
Aug 2010 - Jan 2011

Adding the possibility to dynamically add new words to lexicons "linked" in compiled grammars.

WP3: Translator's Tools

To be entered for M7 - M30.

WP4: Knowledge Engineering

  • knowledge representation infrastructure (D4.1, by Ontotext);
  • aligned semantic models and instance bases (D4.2, by Ontotext and UHEL);
  • two-way grammar-ontology and NL (Natural Language) to ontology interoperability (D4.3, by Ontotext and UGOT).
  • M1-M6 plan

    • all partners send informal answers of the questions above & anything additional that has been missed in the questions (questionnaire)[/node/896]
    • decide on the best server location (on site / rented)
    • Ontotext to provide a description of what is the planned initial infrastructure for KE
    • Ontotext to describe more about the LOD data sets and give the LDSR link
    • UHEL to indicate their ideas on the KE WP, data sets and participation
    • provide examples of entity descriptions in WKB
    • first attempts on NL queries based on the WKB; put examples of the entity description constructs and then decide next steps with UGOT/UHEL;
    • discuss feasibility of creating a NL interface for SPARQL (or a subset)
    • provide examples of entity descriptions in WKB
    • KE infrastructure in place
    • decide on initial data sets with the partners; consider one or many instances
    • KE infrastructure loaded with initial data sets
    • navigation and search web UI over KEI
    • results of the first experiences on ontology to grammar interoperability
    • M4.1 - Knowledge representation - Retrieval access provided infrastructure to the consortium
    • report for M1-M6
    • plan for M6-M12

    M6-M12 plan

    • analyze the needs of autocompletetion and implement scalable solution for it
    • work on transformation from natural language to sparql
    • report for M6-M12
    • plan for M6-M12

    M12-M18 plan

    • experiment with different ontologies
    • write D4.2
    • work on verbalization of the results from the semantic repository search
    • finalize the D4.3 prototype
    • report for M12-M18
    • plan for M12-M18

    M18-M24 plan

    • support the work on other workpakages in the sphere of grammar-ontology interoperability
    • report for M18-M24

    Process & General Considerations

    • we'd like all info to be available on the wiki, incl. notes, decisions, status of the components, docs
    • SVN
    • consider a backlog for the entire project/ each WP
    • consider periodic development/delivery cycles
    • consider regular telecons/skype conferences - once a month at least
    • make application ideas backlog at the wiki

    Appendix to D4.3 Grammar ontology interoperability

    0
    ID: 
    4.1A
    Workpackage: 
    Knowledge Engineering
    Task leader: 
    milen.chechev
    Assignees: 
    milen.chechev
    Relevant Deliverables: 
    Grammar-Ontology Interoperability
    Status: 
    Completed
    Timeframe: 
    Mar 2012 - Apr 2012
    Completed on: 
    31 May, 2012 (All day)

    Add child pages to the living deliverable following instructions given in the abstract.

    http://www.molto-project.eu/wiki/living-deliverables/d43a-appendix-gramm...

    See deliverable

    http://www.molto-project.eu/sites/default/files/D4.3.pdf

    Knowledge Engineering Infrastructure in place

    According to the plan http://www.molto-project.eu/node/858 the Knowledge Engineering Infrastructure has been realeased. It is accessible here. We have imported an exemplary initial data set containing information for different persons, organizations, locations.

    To execute a SPARQL query to the data set, click "SPARQL Query" and for exemple try the following query without the backslashes (\)

    prefix rdf:<\http://www.w3.org/1999/02/22-rdf-syntax-ns#> prefix rdfs:<\http://www.w3.org/2000/01/rdf-schema#> prefix prt:<\http://proton.semanticweb.org/2006/05/protont#> select distinct ?l where { ?s rdf:type prt:Organization ; rdfs:label ?l . }

    It should return the names of all organizations stored in the data set.

    The Knowledge Engineering Infrastructure could be extended with new data sets if new data sets are available, see http://www.molto-project.eu/node/858, http://www.molto-project.eu/node/896 and http://www.molto-project.eu/node/948.

    WP5: Statistical and Robust Translation

    here a better task description

    WP6: Case Study Mathematics

    Mathematical grammars developed using GF for the WebALT project (eContent 22253) allow us to generate multilingual simple drills for high school students and university freshmen. These grammars will be the starting point aiming at extending coverage to word problems, the ones that require the student to first model a situation and then to manipulate the mathematical model to obtain a solution.

    The UPC team, being a main actor in the past developing of gf mathematical grammars and having ample experience in mathematics teaching, will be in charge of the tasks in this work package with help from UGOT on technical aspects of GF and possibly Ontotext on ontology representation and handling.

    State of the art on mathematical and ontological reasoning

    0
    ID: 
    6.2
    Workpackage: 
    Case Study: Mathematics
    Task leader: 
    jordi.saludes
    Assignees: 
    jordi.saludes
    Status: 
    Planned

    It will be required to reason on equations and statements proposed by the student, so we will need to review to what extend an automatic reasoner could deal with student input of this sort and how the system behavior could be designed to degrade gracefully in order to keep the student interaction going.

    Make WebALT grammars modular

    0
    ID: 
    6.3
    Workpackage: 
    Case Study: Mathematics
    Task leader: 
    jordi.saludes
    Assignees: 
    jordi.saludes
    Assignees: 
    Sebastian Xambo
    Relevant Deliverables: 
    Simple drill grammar library
    Status: 
    Ongoing
    Timeframe: 
    Jul 2010

    In the framework of the WebALT project a gf grammar library was developed for generating simple mathematical drills in a variety of languages. The legal status of this library has recently changed to LGPL, making it suitable to be the starting point for the language services demanded by this work package. To achieve a better degree of interchangeability it is required to organize the existing code into modules, remove redundancies and lay them in a way acceptable for easy lexicon enhancement by way of the grammar developer’s tools of work package 2, WP2.

    Commanding GF library

    0
    ID: 
    6.4
    Workpackage: 
    Case Study: Mathematics
    Task leader: 
    jordi.saludes
    Assignees: 
    jordi.saludes
    Status: 
    Planned

    Writing a gf grammar for commanding a generic computer algebra system (CAS) by natural language imperative sentences. Concrete grammars adapted to the CAS at hand. Depends on work package 2 WP2.

    Module for driving a CAS by natural language commands

    0
    ID: 
    6.5
    Workpackage: 
    Case Study: Mathematics
    Task leader: 
    jordi.saludes
    Assignees: 
    jordi.saludes
    Assignees: 
    Sebastian Xambo
    Dependencies: 
    Commanding GF library
    Status: 
    Planned

    Integrate the commanding library into a component to transform the issued commands to the CAS.

    Objects and properties GF library

    0
    ID: 
    6.6
    Workpackage: 
    Case Study: Mathematics
    Task leader: 
    jordi.saludes
    Assignees: 
    jordi.saludes
    Status: 
    Planned

    Gf grammar library able to generate natural language sentences corresponding to objects and relations of the word problem. It must be able to parse simple questions related to the word problem domain into predicates. Depends on work package 2 and probably work package 4.

    Module for semi-automatic reasoning

    0
    ID: 
    6.7
    Workpackage: 
    Case Study: Mathematics
    Task leader: 
    jordi.saludes
    Assignees: 
    jordi.saludes
    Status: 
    Planned

    Automated reasoning is needed to assess the soundness of the model proposed by the student and to answer his/her questions. This requires adding small ontologies describing the word problem, including:

    • Data present in the problem statement;
    • Additional world knowledge to make reasoning possible.

    Add State of the Art study here.

    Set theory reasoning of the ducks and rabbits problem

    Some time ago I managed to build a theory supporting the Farm problem in Isabelle/HOL (attached below)

    I wasn't expecting such a toil but lack of detailed documentation and a wicked simplifier made my life miserable for a whole week.

    Assumptions

    It is based on 3 sets:

    • Being in the farm (farm)
    • Being a duck
    • Being a rabbit
    • and a function: is_leg_of : leg → animal.

      As axioms, we have:

    Ground knowledge axioms

    • Rabbits have 4 legs.
    • Ducks have 2 legs.

    Problem axioms

    • All animals in the farm are rabbits or ducks.
    • There are 100 animals in the farm.
    • There are 260 legs in the farm

    Extra axioms

    That is, facts that are implicitely known but you need to state for Isabelle with Main theory to work:

    • Rabbits and ducks are finite
    • Rabbits and ducks are disjoint

    Variables and equations

    Let R be the number of rabbits in the farm and D the number of ducks in the farm. With the preceding axioms, we were able to produce Isabelle-certified proofs that

    R + D = 100
    

    and

    2*D + 4*R = 260
    

    and then deduce that R=30 and D=70.

    AttachmentSize
    Farm.thy5.67 KB

    Integration into dialog manager

    0
    ID: 
    6.8
    Workpackage: 
    Case Study: Mathematics
    Task leader: 
    jordi.saludes
    Assignees: 
    jordi.saludes
    Status: 
    Planned

    In particular, objects will be annotated by natural language noun-phrases and equations by sentences. These annotations will be parsed into GF interlingua and will be used whenever language generation related to the problem was needed.

    WP7: Case Study Patents

    The work will start with the provision of user requirements (WP9) and the preparation of a parallel patent corpus (EPO) to fuel the training of statistical MT (UPC). In parallel UGOT will work on grammars covering the domain and subsequently, together with UPC, apply the hybrid (WP2, WP5) MT on abstracts and claims. Ontotext will provide semantic infrastructure with loaded existing structured data sets (WP4) from the patent domain (IPC, patent ontology, bio-medical and pharmaceutical knowledge bases, e.g. LLD). Based on the use case requirements, Ontotext will build a prototype (D7.1, D7.2) exposing multiple cross-lingual retrieval paradigms and MT of patent sections. The accuracy will be regularly evaluated through both automatic (e.g. BLEU scoring) and human based (e.g. TAUS) means (WP9).

    Task List

    The work package is split into 9 major tasks as follows:

    • Task 7.1 User Requirements and Scenarios (Task Lead: UPC)
    • Task 7.2 Patent corpora (Task Lead: UPC)
    • Task 7.3 Grammars for the patent domain (Task Lead: UGOT)
    • Task 7.4 Ontologies and document indexation (Task Lead: Ontotext)
    • Task 7.5 Prototype (Task Lead: Ontotext)
    • Task 7.6 SMT and Hybrid MT (Task Lead: UPC)
    • Task 7.7 Prototype (user interface) (Tas Lead by Ontotext)
    • Task 7.8 Human evaluation (Task Lead: TBD)
    • Task 7.9 Patent Case Study: Final Report (Task Lead: UPC)


    Month 10-15 plan

    • Task 7.2 starts in M10 and is due to provide a first set of corpora at the end of M16. Final revision depends on the availability of the EPO data.
    • Task 7.3 starts in M10 and is due to provide a preliminary report at the end of M16.

    Month 16-21 plan

    • Task 7.1 starts at M15 and is due to provide a preliminar version at the beginning of M17.
    • Task 7.3 will produce a more complete report by the beginning of M19.
    • Task 7.4 starts at M16 and is due to provide a description of the type of queries at the end of M16.
    • Task 7.5 starts at M16 and is due to provide a description of the Prototype architecture at the end of M16.
    • Task 7.6 starts along with WP5 and will produce a SMT baseline for the Patents prototype.
    • D7.1 deadline is M21.

    User Requirements

    1 Jun 2010
    1 Jul 2010
    Europe/Stockholm
    ID: 
    7.1
    Workpackage: 
    Case Study: Patents
    Assignees: 
    aarne.ranta
    Assignees: 
    meritxell.gonzalez
    Status: 
    Completed
    Timeframe: 
    May 2011 - Oct 2011

    The patents case study comprises two basic scenarios: the online patent retrieval and the patent translation. In this prototype we tackle these two scenarios separately, as shown in Figure 1, even though they can be viewed as a unique multilingual patent retrieval paradigm. In the future, we plan to study how to automate the reciprocal inputs between the two processes, i.e. the annotation of translations and the translation of semantically annotated documents.

    From a general perspective, two user roles may be defined in this case study: end-users looking for information related to the patents and editors adding new patent documents to a hypothetical repository.

    Details are given in D71.

    Patent Corpora

    1 Jun 2010 15:17
    Europe/Vienna
    ID: 
    7.2
    Workpackage: 
    Case Study: Patents
    Task leader: 
    meritxell.gonzalez
    Assignees: 
    cristina.españa
    Assignees: 
    lluis.marquez
    Status: 
    Completed
    Timeframe: 
    Jun 2011 - Oct 2012
    Completed on: 
    30 November, 2012 - 23:00

    Determining and gathering of bilingual and monolingual corpora for the patent case study.

    • SMT system is trained with te MAREC corpus (WP5).
    • EPO dataset is used for testing pourposes (WP5).
    • www-EPO dataset will be used to fill the retrieval databases (WP7)

    Grammars for the patent domain

    1 Aug 2010
    Europe/Stockholm
    ID: 
    7.3
    Workpackage: 
    Case Study: Patents
    Task leader: 
    ramona.enache
    Assignees: 
    aarne.ranta
    Assignees: 
    ramona.enache
    Status: 
    Ongoing
    Timeframe: 
    Jan 2011 - Nov 2012

    There are two subtasks here:

    • Grammars for translation of the patent documents.
    • Grammars for online-translation of CNL queries

    Ontologies and Document Indexation

    0
    ID: 
    7.4
    Workpackage: 
    Case Study: Patents
    Task leader: 
    meritxell.gonzalez
    Assignees: 
    borislav.popov
    Assignees: 
    mariana.damova
    Status: 
    Ongoing
    Timeframe: 
    Jun 2011 - Oct 2012

    Developing an ontology capturing the structure of patent documents; and indexing the patents documents according to the semantic knowledge.

    Patents Retrieval System

    1 Jul 2010
    Europe/Vienna
    ID: 
    7.5
    Workpackage: 
    Case Study: Patents
    Task leader: 
    lluis.marquez
    Assignees: 
    borislav.popov
    Assignees: 
    milen.chechev
    Assignees: 
    petar
    Relevant Deliverables: 
    Patent Case Study Final Report
    Relevant Deliverables: 
    Patent MT and Retrieval Prototype
    Relevant Deliverables: 
    Patent MT and Retrieval Prototype Beta
    Dependencies: 
    Patent Corpora
    Status: 
    Completed
    Timeframe: 
    Jun 2011 - Dec 2012

    Contact @UPC: Lluis and Cristina

    DEPENDENCIES:

    • TASK 1, 2, 3 and 4
    • WP4. Knowledge Engineering
    • TASK 8 (for final version of prototype)

    Participants:

    • Ontotext,
    • UGOT,
    • UPC

    Contact point @Ontotext: Borislav Popov

    DEADLINES: Beta = M21; Final = M27

    Machine Translation Systems

    22 Mar 2010
    Europe/Vienna
    ID: 
    7.6
    Workpackage: 
    Case Study: Patents
    Assignees: 
    aarne.ranta
    Assignees: 
    cristina.españa
    Assignees: 
    lluis.marquez
    Assignees: 
    meritxell.gonzalez
    Assignees: 
    ramona.enache
    Relevant Deliverables: 
    Patent Case Study Final Report
    Relevant Deliverables: 
    Patent MT and Retrieval Prototype
    Relevant Deliverables: 
    Patent MT and Retrieval Prototype Beta
    Status: 
    Completed
    Timeframe: 
    Jan 2012 - Dec 2012
    Completed on: 
    11 January, 2013 (All day)

    Contact @UPC: Lluis and Cristina

    DEPENDENCIES:

    • TASK 2, 3
    • WP5. A baseline of the WP5 system will be integrated in the prototype.

    Patents abstracts and claim are translated using the baseline of the hybrid system.

    Protoype (User Interface)

    1 Dec 2010
    31 Oct 2011
    Europe/Vienna
    ID: 
    7.7
    Workpackage: 
    Case Study: Patents
    Task leader: 
    borislav.popov
    Assignees: 
    borislav.popov
    Assignees: 
    cristina.españa
    Assignees: 
    lluis.marquez
    Assignees: 
    meritxell.gonzalez
    Assignees: 
    milen.chechev
    Relevant Deliverables: 
    Patent MT and Retrieval Prototype
    Relevant Deliverables: 
    Patent MT and Retrieval Prototype Beta
    Dependencies: 
    Machine Translation Systems
    Dependencies: 
    Patents Retrieval System
    Status: 
    Completed
    Timeframe: 
    Jun 2011 - Sep 2012

    DEPENDENCIES:

    • TASK 1
    • TASK 8 (for final version of prototype)

    Participants:

    • Ontotext,
    • UGOT,
    • UPC

    Contact point @Ontotext: Borislav Popov

    DEADLINES: Beta = M21; Final = M27

    Evaluations

    1 Jun 2011
    30 Jun 2012
    Europe/Vienna
    ID: 
    7.8
    Workpackage: 
    Case Study: Patents
    Assignees: 
    aarne.ranta
    Assignees: 
    maria.mateva
    Assignees: 
    meritxell.gonzalez
    Relevant Deliverables: 
    Patent Case Study Final Report
    Status: 
    Planned

    DEPENDENCIES:

    • TASK 5

    Note: Deadlines have been delayed 3 months due to the WP delay.

    DEADLINE: M31 (to allow for final report)

    Subtasks

    • Preparation starts M19 (at the very latest)
    • Hiring translators
    • Producing guidelines for translators
    • Full evaluation starts at latest M28
      • Evaluation will make use of the TAUS criteria






    TAUS Evaluation Criteria:

    • Excellent (4):
      • Accurately transfers all info; correct terminology, correct grammar. Understanding not improved by reading the source text.
    • Good (3):
      • Contains minor mistakes; would not need to refer to source text to correct the mistakes.
    • Medium (2):
      • Significant errors in output. Would need to read the source text to correct the errors.
    • Poor (1):
      • Serious errors in output. Would need to read the source text to understand the output. Would probably need to retranslate from scratch.

    WP8: Case Study: Cultural Heritage

    The work is started by a study of the existing categorizations and metadata schemas adopted by the museum, as well as a corpus of texts in the current documentation which describe these objects (D8.1, UGOT and Ontotext). We will transform the CIDOC-CRM model into an ontology aligning it with the upper-level one in the base knowledge set (WP4) and modeling the museum object metadata as a domain specific knowledge base. Through the interoperability engine from WP4 and the IDE from WP2, we will semi-automatically create the translation grammar and further extend it (D8.2, UGOT, UHEL, UPC, Ontotext). The final result will be an online system enabling museum (virtual) visitors to use their language of preference to search for artefacts through semantic (structured) and natural language queries and examine information about them. We will also automatically generate a set of articles in the Wikipedia format describing museum artefacts in the 5 languages with extensive grammar coverage (D8.3, UGOT, Ontotext).

    Links to Swedish museum databases who use the Carlotta system which is built upon the CIDOC-CRM model:

  • http://samsok.kmmuseum.se/
  • http://carlotta.gotlib.goteborg.se/pls/carlotta/welcome
  • http://www.tremil.se/pls/hborg/rigby.welcome
  • http://collections.smvk.se/pls/em/rigby.SokEnkel
  • WP9: User Requirements and Evaluation

    Requirements for work

    The work will start with collecting user requirements for the grammar development IDE (WP2), translation tools (WP3), and the use cases (WP6-8).

    We will define the evaluation criteria and schedule in synchrony with the WP plans (D9.1). We will define and collect corpora including diagnostic and evaluation sets, the former, to improve translation quality on the way, and the latter to evaluate final results.

    Corpus definitions

    • Each corpus available for MOLTO will be described by the providing project members.
    • The corpora will be split for development, diagnostic and evaluation use.
    • Contact persons will be named for questions and requests for each corpus.
    • Storage places and access protocols to gain this specified corpus data will be defined.

    Description of end-user workflow

    Translator's new role (parallel to WP3: Translator's tools) will be designed and described in the D9.1 deliverable. Most current translator's workbench software treat the original text as read-only source. The tools to be developed within WP3 (+ 2) will lead towards more mutable role of source text. The translation process will resemble more like structured document editing or multilingual authoring than transformation from a fixed source to a number of target languages.

    We will only provide a basic infrastructure API for external translation workbenches and keep an eye on the "new multilingual translator's workflow".

    Introduction of WP liaison persons and other contacts

    For each work package, the liaison contact information and work progress will be kept up-to-date on the MOLTO web site. Our liaison person Mirka Hyvärinen will be in contact with other project members.

    Also possibility to access UHEL's internal working wiki "MOLTO kitwiki" will be granted upon request to other project members.

    Evaluation of results

    Evaluation aims at both quality and usability aspects. UHEL will develop usability tests for the end-user human translator. The MOLTO-based translation workflow may differ from the traditional translator's workflow. This will be discussed in the D9.1 evaluation plan.

    To measure the quality of MOLTO translations, we compare them to (i) statistical and symbolic machine translation (Google, SYSTRAN); and (ii) human professional translation. We will use both automatic metrics (IQmt and BLEU; see section 1.2.8 for details) and TAUS quality criteria (Translation Automation Users Society). As MOLTO is focused on information-faithful grammatically correct translation in special domains, TAUS results will probably be more important.

    Given MOLTO's symbolic, grammar-based interlingual approach, scalability, portability and usability are important quality criteria for the translation results. For the translator's tools, user-friendliness will be a major aspect of the evaluation. These criteria are quantified in (D9.1) and reported in the final evaluation (D9.2).

    In addition to the WP deliverables, there will be continuous evaluation and monitoring with internal status reports according to the schedule defined in D9.1.

    WP10: Dissemination and Exploitation

    Define workplan here

    Factorize the Food grammar

    15 Apr 2010
    Europe/Stockholm
    ID: 
    10.3
    Task leader: 
    aarne.ranta
    Assignees: 
    aarne.ranta
    Assignees: 
    olga.caprotti
    Status: 
    Completed
    Completed on: 
    26 March, 2010 - 01:00

    Factorize the grammar used now for the demo fridge in modules that isolate the different kinds of phrases: eg. Comments, Greetings, Questions, etc. Check whether there are ontologies that describe these.

    The factorization can be seen in the phrasebook example under /example/phrasebook.

    MOLTO Phrasebook May Release

    0
    ID: 
    10.4
    Task leader: 
    aarne.ranta
    Assignees: 
    aarne.ranta
    Assignees: 
    krasimir.angelov
    Assignees: 
    olga.caprotti
    Assignees: 
    ramona.enache
    Assignees: 
    thomas.hallgren
    Dependencies: 
    Factorize the Food grammar
    Status: 
    Completed
    Timeframe: 
    Apr 2010 - May 2010
    Completed on: 
    1 June, 2010 (All day)

    The MOLTO Phrasebook is a web application for the traveler, eventually it will be a phone application (for the Android). It consists of frequently used phrases that a foreigner might want to use when abroad.

    Features of May Release

    • fixes and additions in RGL
    • data collection from
      • wikipedia phrasebook
      • wiki page for collection
    • web server
    • web GUI
    • Android GUI
    • structured and customizable release (e.g. choose the language pair)
    • agree on base abstract syntax
    • Android stand-alone
    • complete remaining concretes
      • examples + native informant
    • feedback button
      • current state info
      • spam issue
    • unlexing
    • lexing &+
    • disambiguation
    • deletion
    • history
    • gr by shaking

    demo preview: http://tournesol.cs.chalmers.se/~aarne/phrasebook/phrasebook.html

    Investigate the Internationalization API of Facebook

    0
    ID: 
    10.6
    Task leader: 
    olga.caprotti
    Assignees: 
    johnj.camilleri
    Assignees: 
    Kaarel.Kaljurand
    Assignees: 
    krasimir.angelov
    Assignees: 
    thomas.hallgren
    Relevant Deliverables: 
    GF Grammar Compiler API
    Status: 
    Planned
    Timeframe: 
    Mar 2012 - Apr 2013

    The current GF Grammar Compiler API is providing translation services that can be called on-the-fly. The goal of this task is to find out how to integrate them to an existing API where there is a need for Internationalization, example Facebook https://developers.facebook.com/docs/internationalization.

    The image shows how translations are entered manually in the current version. My guess is that we could improve on that.

    Anther example is the situation of commonly used sentences: "Happy birthday", we have on our Travel Phrasebook, we do not have Portuguese, we could friends-source it :) but how? Give them a FB app?

    Love to see some comments on this.

    BTW, I am not partial to FB, you can check any social network of your liking that provides an Internationalization API. This is a test of concept also looking for CNLs in the wild :)

    WP11: Multilingual semantic wiki - AceWiki

    Introduction

    The core of WP11 is an existing wiki system AceWiki which is going to be developed into a multilingual controlled natural language wiki system within the MOLTO project.

    Important links

    The AceWiki homepage (http://attempto.ifi.uzh.ch/acewiki/) contains:

    • demo wikis
    • list of related publications
    • further links

    AceWiki development is hosted on GitHub (https://github.com/AceWiki/AceWiki)

    Tasks

    • Task 11.1: Make the AceWiki design multilingual and implement a small AceWiki engine for multilingual GF grammars
    • Task 11.2: General refactoring of the AceWiki code

    Meetings


    To do

    Phase 1

    AceWiki side:

    • Refactor AceWiki to support different implementations of predictive parsers (at the moment, AceWiki's chartparser is hardwired) [done: Tobias]
    • Extract from the package "aceowl" everything that should be reused in GFAceWiki into a new package (mostly OWL-related stuff) [done: Tobias]
    • Connect AceWiki to GF (via JPGF) [done: Tobias]
    • Change the AceWiki architecture to support multilinguality [done: Tobias]
    • Implement a simple AceWiki language engine for multilingual GF grammars (as an alternative to the current ACE-OWL engine) [done: Tobias]

    GF side:

    Phase 2

    • Implement support for adding/changing/deleting words
    • Implement more languages
    • Choose application domain(s) and build exemplary knowledge base
    • Perform user studies

    Releases

    Release notes: https://raw.github.com/AceWiki/AceWiki/master/CHANGES.txt

    • 2012-01-05: v0.5.2

    Unsorted

    AceWiki as a webservice

    See also https://github.com/yuchangyuan/AceWiki

    See also the thread starting with: https://lists.ifi.uzh.ch/pipermail/attempto/2011-December/000818.html

    • There could be potentially multiple front-ends
      • the current front-end
      • an existing GF front-end
      • Emacs
      • Unix commandline
      • ACE View
      • native Android/iOS app
      • ...
    • REST API
      • allows to easily push content (existing lexicon, existing ACE text) into the wiki
      • should support GET as much as possible
    • Make AceWiki easily deployable on hosting providers such as Google App Engine
      • reasons: speed, reliability, etc.
      • for that to work, AceWiki should be completely in Java (i.e. no using of APELocal), i.e. ape.exe would still have to be on a different host (because it is in Prolog)
      • would reasoning performance profit in a major way?

    AceWiki more configurable

    AceWiki Refactoring

    0
    ID: 
    11.2
    Assignees: 
    Kaarel.Kaljurand
    Assignees: 
    Tobias.Kuhn
    Relevant Deliverables: 
    Multilingual semantic wiki
    Status: 
    Completed
    Timeframe: 
    Dec 2011 - Jan 2012
    Completed on: 
    5 January, 2012 (All day)

    General refactoring and clean-up of the AceWiki code.

    Multilingual AceWiki

    0
    ID: 
    11.1
    Assignees: 
    Tobias.Kuhn
    Relevant Deliverables: 
    Multilingual semantic wiki
    Status: 
    Completed
    Timeframe: 
    Dec 2011 - Jan 2012
    Completed on: 
    5 January, 2012 (All day)

    Make the AceWiki design multilingual and implement a small AceWiki engine for multilingual GF grammars.