D1.1 Work plan for MOLTO

Contract No.:	FP7-ICT-247914
Project full title:	MOLTO - Multilingual Online Translation
Deliverable:	D1.1. Work plan for MOLTO
Security (distribution level):	Confidential
Contractual date of delivery:	M1
Actual date of delivery:	1 April 2010
Type:	Report
Status & version:	Final (evolving document)
Author(s):	A. Ranta et al.
Task responsible:	UGOT
Other contributors:

Abstract

Detailed work plan for internal use of the consortium.

This is an evolving description of the work plan of MOLTO, divided in work packages and in tasks. The document is meant to track what the MOLTO Consortium is planning to do, what it has completed so far and the status of the ongoing research. It is the responsibility of the work package leader to enter tasks and to keep them up to date so as to reflect the work done by the group.

</br/></p/>

WP1: Management

Detailed workplan for WP1

A number of management tasks are entitled to the coordinator: e.g. - collecting information from partners,
- reviewing and submitting information on the progress of the project as well as reports and other deliverables to EC; - preparation of meetings, - proposing the decisions and preparing the agenda of the SG, - chairing of the SG meetings and monitoring the implementation of decisions taken at the meetings; - presenting the results of the consortium and serving as the secretary in the meetings; - administering the EC financial contribution and fulfilling other financial tasks, - maintaining the project's website etc.

According to the Grant Agreement, Annex II, management of the consortium activities includes:

maintenance of the consortium agreement, if it is obligatory,
the overall legal, ethical, financial and administrative management including, for each of the beneficiaries, the obtaining of the certificates on the financial statements and on the methodology and costs relating to financial audits and technical reviews,
implementation of competitive calls by the consortium for the participation of new beneficiaries, where required by Annex I of this grant agreement,
any other management activities foreseen by the annexes, except coordination of research and technological development activities.

Associate tasks to workpackages

Start: 15 Mar 2010

Timezone: Europe/Stockholm

ID:

1.1

Workpackage:

Management

Task leader:

aarne.ranta

Assignees:

olga.caprotti

Relevant Deliverables:

Dissemination plan, with monitoring and assessment

Status:

Completed

Timeframe:

Mar 2010 - Apr 2010

Completed on:

25 March, 2010 - 23:00

In order to get an overview of the workpackage: - add a view of the associated tasks - add a view of the deliverables

Create content type "deliverable"

Start: 15 Mar 2010

Timezone: Europe/Stockholm

ID:

1.2

Workpackage:

Management

Task leader:

aarne.ranta

Assignees:

olga.caprotti

Status:

Completed

Create an admin type "deliverable" to collect the info on due deliverable so that they can be tracked on the calendar and in the workpackage's description.

See list of all deliverables

Revision of the management report

Start: 0

Timezone:

ID:

1.3

Workpackage:

Management

Task leader:

aarne.ranta

Assignees:

aarne.ranta

Assignees:

emilia.rung

Assignees:

olga.caprotti

Relevant Deliverables:

Periodic management report 1

Status:

Completed

Timeframe:

May 2011 - Jul 2011

Completed on:

21 June, 2011 - 18:00

The commission requested the following:

Session	Submitted on	Verified on
5.2	Apr 26, 2011 2:40:36 PM	May 3, 2011 2:33:02 PM

You are kindly requested to clarify the issues raised in this letter and submit a revised periodic report and Forms C through NEF at the latest on 16th of May. Should you require more time, please contact us. However, should we not have heard from you by the deadline we will proceed with the information at hand. Please note that in such case, this may lead to all or part of the costs being rejected.

Please note that according to Article II.5 of the grant agreement, the period for the execution of the payment has been suspended pending receipt of the additional information and the revised Periodic Report through NEF.

Please clarify the following points in the revised periodic report, and revise the Forms C if necessary:

Reports – general: Please add a list of conferences and meetings you attended, including who participated, venue and for what purpose.
Please provide in more detail the activities carried out by each partner according to the template which can be found on EC/Cordis home page (http://cordis.europa.eu/fp7/find-doc_en.html).
Beneficiary 1 – There is a discrepancy between average personnel cost compared to the budget. Could you please clarify why average personnel costs are more then 15 % higher then budget?
Beneficiary 2 – There is a marked difference between the use of MM (compared to budget) and funding for personnel in the 1st period. The average personnel cost is lower than budgeted for. Could you please explain these discrepancies?
Beneficiary 3 – Please justify coffee breaks under subcontracting considering that you anticipated as subcontracting only the auditing costs (p. 45 of the Annex I of the Grant Agreement).
Beneficiary 4 - Please specify more accurately use of MM per WP. From your explanation it is not clear enough how many MM you used. Could you please correct the table in the Management of the use of resources part of the report where the total costs seem to appear under indirect costs while the item indirect costs differs from that reported in the form C.

Attachment	Size
MOLTO (247914) _ Periodic Report and Cost Claim submission in NEF.pdf	77.56 KB
Cost Claim overview MOLTO (247914) 2011-06-22.pdf	74.59 KB
D1 3R (amended).docx	232.74 KB

MOLTO-enlarged negotiation

Start: 0

Timezone:

ID:

1.4

Workpackage:

Management

Task leader:

aarne.ranta

Assignees:

olga.caprotti

Status:

Ongoing

Timeframe:

Jul 2011 - Aug 2011

Negotiation Mandate

Proposal number:288317 Acronym:MOLTO-Enlarged EU
Strategic objective/theme:
Project Officer (to whom all documents must be returned): MR. BROCHARD Michel Tel:33912 European Commission e-Mail:Michel.Brochard@ec.europa.eu DG INFSO - E 01 Office: EUFO - 02/270 L - 2920 Luxembourg
Date and time of first negotiation meeting: 13/07/2011 at 10:00AM Address for the first negotiation meeting: 10 rue de R. StumperL - 2920 Luxembourg
EC financial contribution: Maximum financial EC contribution 600.000,00 € (euro)
Duration of the project: 18 Months
Change of technical content: Forthcoming
```
Timetable for negotiation:
```
- 05/08/2011 Deadline for the first version of the description of work(Annex I) and GPF
- 13/07/2011 Negotiation meeting in Luxembourg
- 31/08/2011 End of negotiation

Accompanying letter

The negotiating Project Officer (PO) is Mr. BROCHARD Michel. The full contact references are detailed in the "Negotiation Mandate".

Please note that the negotiation must be successfully concluded by 31/08/2011.

In case this deadline is not met, the Commission reserves the right to cancel negotiations and any subsequent offer for a project grant agreement. We also would like to draw your attention to the fact that negotiations may be terminated, or the negotiation mandate modified, if so required following the results of the consultation with other departments within the Commission.

Please note that in accordance with the legislation in force, the coordinator is obliged to deposit any pre-financing received from the Commission on an interest-bearing bank account. If you do not comply with this obligation, your participation as coordinator may not be accepted.

The negotiation process is supported by an on-line tool called NEF which you will need to use to submit data that is necessary for the grant agreement.

NEF will also provide access to the Legal & Financial Validation form (LFV lite). The LFV lite provides an overview of the status concerning the legal and financial data of the partners of your project, and indicates those partners for whom legal and/or financial data is missing. If the legal and/or financial data of one or more partners is flagged as needed in the LFV lite or would be incorrect, new legal and/or financial documents must be submitted for the partner(s) concerned. Additionally the Commission can also request documents and information in regard to the operational capacity of the consortium and beneficiaries to achieve the objectives and expected results of the project.

The detailed explanations for accessing NEF will be sent shortly in a separate e-mail. Further guidance is available on-line at the following address: http://ec.europa.eu/research/negotiation/

You should have already received the Evaluation Summary Report (ESR) in the info letter email. If not, please contact the negotiating Project Officer.

The negotiation guidance notes and the most recent templates for the Description of Work (Annex I to the Grant Agreement) are available at: Nef Annex 1 - Concept. Other useful information on Framework Programme 7 is available at http://cordis.europa.eu/fp7/find-doc_en.html and includes:

documents referring to the negotiation guidance notes,
the model Grant Agreement and special conditions,
the guide to financial issues,
the checklist for a consortium agreement for FP7 projects, and
the guide to intellectual property rules for FP7.

This letter should not be regarded under any circumstances as a formal commitment by the Commission to give financial support as this depends, in particular, on the satisfactory conclusion of negotiations and the completion of the formal selection process. Should you have any queries about the above, please do not hesitate to contact the negotiating Project Officer.

Aarne's task list

The main issue to solve is the budget cut, which of course is the usual thing to happen. We will get 600k instead of the 712k we applied for. My suggestion is that we cut all WP's and sites in proportion, so we don't need to change the work description too much.

The realistic goal is that the work will begin on 1 September. Even this needs some effort from us:

nominate one contact person (for UZH and BI) who is available at least periodically through the negotiations and authorized (by the rest of the site) to make decisions
finalize the work description
adjust the budget
provide the list of persons (mainly an issue for UZH and BI) as well as relations to other ongoing EC projects
extend the MOLTO consortium agreement and get it signed
we should work as if the deadline was 26 August; if we seem to fail, we can probably ask the EC for extension

Attachment	Size
comments_MOLTO_Ext.pdf	90.52 KB

Followup on review requests

Start: 0

Timezone:

ID:

1.5

Workpackage:

Management

Task leader:

olga.caprotti

Assignees:

aarne.ranta

Assignees:

borislav.popov

Assignees:

jordi.saludes

Assignees:

lauri.carlson

Status:

Assigned

Timeframe:

Sep 2011

Please address the reviewers remarks by the end of September 2011!!!!

MOLTO - financial reporting period 2

Start: 0

Timezone:

ID:

1.6

Workpackage:

Management

Task leader:

aarne.ranta

Assignees:

kristina.orban....

Status:

Ongoing

Timeframe:

Mar 2012 - Apr 2012

Soon it is time for the reporting of period 2 (01/03/2011 – 29/02/201)of the project MOLTO.

You have to send me:

two copies of the Form C (Financial Statement). The (Form C) is to be submitted both electronically (via the Participant Portal) and as a paper copy. After that I have submitted the Form C to the European Commission you can print it out and sign it. I’ll inform you when the Forms C are ready for signatures.

This year you can complete the Use of resources directly in the NEF when completing the Form C. You will have to write short explanations of the costs: the number of person months, travel costs (who travelled where and for which purpose/meeting), consumables etc. All the costs must be related to a Work package.

A Certificate on Financial Statement (CFS) is necessary if your claimed EC contribution is equal to or more than 375 000€. Observe that although the threshold is established on the basis of the EC contribution, the CFS must certify all eligible costs.

The deadline for submitting your financial statement in the Participant Portal as well as sending me the Use of Resources by e-mail is 1st of April 2011.

The signed Financial Statement and the CFS (if applicable) have to be submitted to me in paper copies. Please send the originals by courier to address below.

To access the project via the Participant Portal, click on the following link: http://ec.europa.eu/research/participants/portal/

To log into the Participant Portal you need to have an account. If you don't have an account yet follow the 'register' link and instructions on the Participant Portal main page.

Once logged in with the account associated with your email address, the list of the projects you are involved in will appear under the 'My Projects' tab. The project MOLTO (247914) will appear under tab “Active”. By selecting “FR” on that line you will gain access to the Form C.

Do not hesitate to contact me if you have any questions. Kristina

Kristina Orbán Meunier

UNIVERSITY OF GOTHENBURG Research and Innovation Services

Erik Dahlbergsgatan 11B Box 100, 405 30 Göteborg, Sweden Tel +46 31 786 6466

mobile +46 766 229466

kristina.orban.meunier@gu.se

www.gu.se/researchinnovation

WP2: Grammar Developer’s Tools

The grammar developer's tools are divided to two kinds of tasks:

GF grammar compiler API
actual tools implemented by using the API

The workplan for the first six months concerns mostly the API, with the main actual tool being the GF Shell, which is a line-based grammar development tool. It is a powerful tool since it enables scripting, but it is not integrated with other working environments. The most important other environment will be web-based access to the grammar compiler.

Note that most discussions on GF are public at http://code.google.com/p/grammatical-framework/.

Here follows the work plan, with tasks assigned to sites and approximate months.

Improving the Resource Grammar Library API and its documentation

Start: 0

Timezone:

ID:

2.10

Workpackage:

Grammar Developer’s Tools

Task leader:

aarne.ranta

Assignees:

ramona.enache

Relevant Deliverables:

GF Grammar Compiler API

Dependencies:

Designing the API and writing its documentation

Status:

Ongoing

Timeframe:

Mar 2010

Documentation of GF is hosted on Google Code at http://code.google.com/p/grammatical-framework/

There is a wiki cover page for the Resource Grammar Library API and an online version at http://www.grammaticalframework.org/compiler-api/.

Designing the API and writing its documentation

Start: 0

Timezone:

ID:

2.5

Workpackage:

Grammar Developer’s Tools

Task leader:

aarne.ranta

Assignees:

krasimir.angelov

Status:

Completed

Timeframe:

Mar 2010 - Aug 2010

Completed on:

1 October, 2011 (All day)

The GF API design will take into account the following requirements:

programming environment eg Eclipse, XCode, NotePad++, Web etc
standard formats for I/O
....

The documentation is being hosted at the GF website.

Example-based grammar writing

Start: 0

Timezone:

ID:

2.2

Workpackage:

Grammar Developer’s Tools

Task leader:

ramona.enache

Assignees:

aarne.ranta

Relevant Deliverables:

Grammar IDE

Status:

Ongoing

Timeframe:

Jul 2010 - Sep 2010

What we mean by example based grammar writing.

Current status is proof of concept: it is possible to load example based grammar and to compile it.

Need to do: - ....

GF runtime in C

Start: 0

Timezone:

ID:

2.11

Workpackage:

Grammar Developer’s Tools

Task leader:

lauri.alanko

Assignees:

jordi.saludes

Assignees:

krasimir.angelov

Assignees:

lauri.alanko

Status:

Ongoing

Timeframe:

Apr 2010

Overview

The runtime is the part of the GF system that implements parsing and linearization of texts based on a PGF grammar that has been produced by the GF compiler.

The standard GF runtime is written in Haskell like the rest of the system. Unfortunately this results in a large memory footprint and possibly also portability problems, which preclude its use in certain applications.

The goal of the current task is to reimplement the GF runtime as a pure C library. This C library can then hopefully be used in some situations where the Haskell-based runtime would be unwieldy.

Status

Preview versions of the implementation, libpgf, are available from the project home page. This is also where up-to-date documentation can be found.

Morphology server and its API

Start: 0

Timezone:

ID:

2.3

Workpackage:

Grammar Developer’s Tools

Task leader:

aarne.ranta

Assignees:

aarne.ranta

Assignees:

krasimir.angelov

Relevant Deliverables:

Grammar IDE

Status:

Planned

Timeframe:

Aug 2010 - Oct 2010

The compiler API must be used by the morphology server.

Plugin to Python NLTK

Start: 0

Timezone:

ID:

2.8

Workpackage:

Grammar Developer’s Tools

Task leader:

jordi.saludes

Assignees:

jordi.saludes

Status:

Completed

To develop a python plugin for gf (based on the planned C plugin) and connect it to relevant parts of the Natural Language Toolkit (http://www.nltk.org/)

Subtasks

2.8.1 Develop python bindings to gf.

2.8.2 nltk integration.

GF python bindings

Using the GF python bindings

This is how to use some of the functionalities of the GF shell inside Python.

Installation

Due to some ghc glitch, it only builds on Linux.

You'll need the source distribution of GF, ghc and the Python development files¹. Then, go to the python bindings folder and build it:

 cd GF/contrib/py-bindings
 make

It will build a shared library (gf.so) that you can import and use into Python as shown below.

Testing installation

To test if it works correctly, type:

 python -m doctest example.rst

Examples

Loading a pgf file

First you must import the library:

% import gf

then load a PGF file, like this tiny example:

% pgf = gf.read_pgf("Query.pgf")

We could ask for the supported languages:

% pgf.languages()
[QueryEng, QuerySpa]

The start category of the PGF module is:

% pgf.startcat()
Question

Parsing and linearizing

Let's us save the languages for later:

% eng,spa = pgf.languages()

These are opaque objects, not strings:

% type(eng) 
(type 'gf.lang')

and must be used when parsing:

% pgf.parse(eng, "is 42 prime") 
[Prime (Number 42)]

Yes, I know it should have a '?' at the end, but there is not support for other lexers at this time.

Notice that parsing returns a list of gf trees. Let's save it and linearize it in Spanish:

% t = pgf.parse(eng, "is 42 prime")
% pgf.linearize(spa, t[0])
'42 es primo'

(which is not, but there is a '?' lacking at the end, remember?)

Getting parsing completions

One of the good things of the GF shell is that it suggests you which tokens can continue the line you are composing.

We got this also in the bindings. Suppose we have no idea on how to start:

% pgf.complete(eng, "")
['is']

so, there is only a sensible thing to put in. Let's continue:

% pgf.complete(eng, "is ")
[]

Is it important to note the blank space at the end, otherwise we get it again:

% pgf.complete(eng, "is")
['is']

But, how come that nothing is suggested at "is "? At the current point, a literal integer is expected, so GF would have to present an infinite list of alternatives. I cannot blame it for refusing to do so.

% pgf.complete(eng, "is 42 ")
['even', 'odd', 'prime']

Good. I will go for 'even', just to be in the safe side:

% pgf.complete(eng, "is 42 even ")
[]

Nothing again, but this time the phrase is complete. Let us check it by parsing:

% pgf.parse(eng, "is 42 even")
[Even (Number 42)]

Deconstructing gf trees

We store the last result and ask for its type:

% t = pgf.parse(eng, "is 42 even")[0]
% type(t)
(type 'gf.tree')

What's inside this tree? We use unapply for that:

% t.unapply()
[Even, Number 42]

This method returns a list with the head of the fun judgement and its arguments:

% map(type, _)
[(type 'gf.cid'), (type 'gf.expr')]

Notice the argument is again a tree (gf.tree or gf.expr, it is all the same here.)

% t.unapply()[1]
Number 42

We will repeat the trick with it now:

% t.unapply()[1].unapply()
[Number, 42]

and again, the same structure shows up:

% map(type, _)
[(type 'gf.cid'), (type 'gf.expr')]

One more time, just to get to the bottom of it:

% t.unapply()[1].unapply()[1].unapply()
42

but now it is an actual number:

% type(_)
(type 'int')

We ended with a full decomposed fun judgement.

In Ubuntu I got it by installing the package python-all-dev. ↩

Refactoring the grammar compiler code base (to improve reusability)

Start: 0

Timezone:

ID:

2.1

Workpackage:

Grammar Developer’s Tools

Task leader:

aarne.ranta

Assignees:

krasimir.angelov

Status:

Assigned

Timeframe:

Mar 2010 - Jul 2010

Here a slighly better description with eventually relevant links to sw, documentation etc.

Release of GF 3.2

Start: 0

Timezone:

ID:

2.4

Workpackage:

Grammar Developer’s Tools

Task leader:

aarne.ranta

Assignees:

aarne.ranta

Assignees:

krasimir.angelov

Status:

Completed

Timeframe:

Mar 2010 - Dec 2010

Completed on:

15 December, 2012 (All day)

Major features:

pgf format is updated and documented in the wiki
completed runtime type checker for dependent types
parsing with dependent types
on-line parsing
type-error reporting
exhaustive generation of ASTs (also via lambda prolog)
probabilities in the abstract syntax
random generation guided by probability
parse results ranked by probability
example based grammar generation (extra script)

New languages:

Urdu, complete resource grammar
Turkish, complete morphology
Amharic, complete resource grammar
Punjabi, complete morphology

Web-based grammar development environment (version 1)

Start: 0

Timezone:

ID:

2.6

Workpackage:

Grammar Developer’s Tools

Task leader:

aarne.ranta

Assignees:

krasimir.angelov

Assignees:

thomas.hallgren

Relevant Deliverables:

Grammar IDE

Dependencies:

Release of GF 3.2

Status:

Ongoing

Timeframe:

Aug 2010 - Mar 2011

Prototype

Web-based tools for grammarians: http://www.grammaticalframework.org/demos/gfse/

Ongoing work at http://cloud.grammaticalframework.org.

Similar work

Look into online IDE platforms, like Kodingen and CodeRun.

There is work for Ajax-based code editors, eg Ymacs, which could be useful since there is a GF mode for emacs already (where?).

The emacs mode can now be found in http://www.grammaticalframework.org/src/tools/gf.el (note by Aarne)

There is also a Mozilla project, Bespin, to build a web-based editor extensible by javascript.

Also - check Orc, yet another online IDE for a new language, using CodeMirror as editor.

Integrating probabilities in GF and PGF

Start: 0

Timezone:

ID:

2.8

Workpackage:

Grammar Developer’s Tools

Task leader:

aarne.ranta

Assignees:

aarne.ranta

Status:

Planned

Timeframe:

Oct 2010 - Dec 2010

Design and intergrate probabilistic features to GF and PGF.

Extend planning here.

Integration with ontology tools

Start: 0

Timezone:

ID:

2.9

Workpackage:

Grammar Developer’s Tools

Task leader:

aarne.ranta

Assignees:

aarne.ranta

Assignees:

borislav.popov

Assignees:

lauri.carlson

Status:

Planned

Finale phase of the work planned in this workpackage. Exact scheduling to be defined.

On-line extension of PGF with new words

Start: 0

Timezone:

ID:

2.7

Workpackage:

Grammar Developer’s Tools

Task leader:

aarne.ranta

Assignees:

krasimir.angelov

Status:

Planned

Timeframe:

Aug 2010 - Jan 2011

Adding the possibility to dynamically add new words to lexicons "linked" in compiled grammars.

WP3: Translator's Tools

To be entered for M7 - M30.

WP4: Knowledge Engineering

knowledge representation infrastructure (D4.1, by Ontotext);

aligned semantic models and instance bases (D4.2, by Ontotext and UHEL);

two-way grammar-ontology and NL (Natural Language) to ontology interoperability (D4.3, by Ontotext and UGOT).

M1-M6 plan

all partners send informal answers of the questions above & anything additional that has been missed in the questions (questionnaire)[/node/896]
decide on the best server location (on site / rented)
Ontotext to provide a description of what is the planned initial infrastructure for KE
Ontotext to describe more about the LOD data sets and give the LDSR link
UHEL to indicate their ideas on the KE WP, data sets and participation
provide examples of entity descriptions in WKB
first attempts on NL queries based on the WKB; put examples of the entity description constructs and then decide next steps with UGOT/UHEL;
discuss feasibility of creating a NL interface for SPARQL (or a subset)
provide examples of entity descriptions in WKB
KE infrastructure in place
decide on initial data sets with the partners; consider one or many instances
KE infrastructure loaded with initial data sets
navigation and search web UI over KEI
results of the first experiences on ontology to grammar interoperability
M4.1 - Knowledge representation - Retrieval access provided infrastructure to the consortium
report for M1-M6
plan for M6-M12

M6-M12 plan

analyze the needs of autocompletetion and implement scalable solution for it
work on transformation from natural language to sparql
report for M6-M12
plan for M6-M12

M12-M18 plan

experiment with different ontologies
write D4.2
work on verbalization of the results from the semantic repository search
finalize the D4.3 prototype
report for M12-M18
plan for M12-M18

M18-M24 plan

support the work on other workpakages in the sphere of grammar-ontology interoperability
report for M18-M24

Process & General Considerations

we'd like all info to be available on the wiki, incl. notes, decisions, status of the components, docs
SVN
consider a backlog for the entire project/ each WP
consider periodic development/delivery cycles
consider regular telecons/skype conferences - once a month at least
make application ideas backlog at the wiki

Appendix to D4.3 Grammar ontology interoperability

Start: 0

Timezone:

ID:

4.1A

Workpackage:

Knowledge Engineering

Task leader:

milen.chechev

Assignees:

milen.chechev

Relevant Deliverables:

Grammar-Ontology Interoperability

Status:

Completed

Timeframe:

Mar 2012 - Apr 2012

Completed on:

31 May, 2012 (All day)

Add child pages to the living deliverable following instructions given in the abstract.

http://www.molto-project.eu/wiki/living-deliverables/d43a-appendix-gramm...

See deliverable

http://www.molto-project.eu/sites/default/files/D4.3.pdf

Knowledge Engineering Infrastructure in place

According to the plan http://www.molto-project.eu/node/858 the Knowledge Engineering Infrastructure has been realeased. It is accessible here. We have imported an exemplary initial data set containing information for different persons, organizations, locations.

To execute a SPARQL query to the data set, click "SPARQL Query" and for exemple try the following query without the backslashes (\)

prefix rdf:<\http://www.w3.org/1999/02/22-rdf-syntax-ns#> prefix rdfs:<\http://www.w3.org/2000/01/rdf-schema#> prefix prt:<\http://proton.semanticweb.org/2006/05/protont#> select distinct ?l where { ?s rdf:type prt:Organization ; rdfs:label ?l . }

It should return the names of all organizations stored in the data set.

The Knowledge Engineering Infrastructure could be extended with new data sets if new data sets are available, see http://www.molto-project.eu/node/858, http://www.molto-project.eu/node/896 and http://www.molto-project.eu/node/948.

WP5: Statistical and Robust Translation

here a better task description

WP6: Case Study Mathematics

Mathematical grammars developed using GF for the WebALT project (eContent 22253) allow us to generate multilingual simple drills for high school students and university freshmen. These grammars will be the starting point aiming at extending coverage to word problems, the ones that require the student to first model a situation and then to manipulate the mathematical model to obtain a solution.

The UPC team, being a main actor in the past developing of gf mathematical grammars and having ample experience in mathematics teaching, will be in charge of the tasks in this work package with help from UGOT on technical aspects of GF and possibly Ontotext on ontology representation and handling.

State of the art on mathematical and ontological reasoning

Start: 0

Timezone:

ID:

6.2

Workpackage:

Case Study: Mathematics

Task leader:

jordi.saludes

Assignees:

jordi.saludes

Status:

Planned

It will be required to reason on equations and statements proposed by the student, so we will need to review to what extend an automatic reasoner could deal with student input of this sort and how the system behavior could be designed to degrade gracefully in order to keep the student interaction going.

Make WebALT grammars modular

Start: 0

Timezone:

ID:

6.3

Workpackage:

Case Study: Mathematics

Task leader:

jordi.saludes

Assignees:

jordi.saludes

Assignees:

Sebastian Xambo

Relevant Deliverables:

Simple drill grammar library

Status:

Ongoing

Timeframe:

Jul 2010

In the framework of the WebALT project a gf grammar library was developed for generating simple mathematical drills in a variety of languages. The legal status of this library has recently changed to LGPL, making it suitable to be the starting point for the language services demanded by this work package. To achieve a better degree of interchangeability it is required to organize the existing code into modules, remove redundancies and lay them in a way acceptable for easy lexicon enhancement by way of the grammar developer’s tools of work package 2, WP2.

Commanding GF library

Start: 0

Timezone:

ID:

6.4

Workpackage:

Case Study: Mathematics

Task leader:

jordi.saludes

Assignees:

jordi.saludes

Dependencies:

Make WebALT grammars modular

Status:

Planned

Writing a gf grammar for commanding a generic computer algebra system (CAS) by natural language imperative sentences. Concrete grammars adapted to the CAS at hand. Depends on work package 2 WP2.

Module for driving a CAS by natural language commands

Start: 0

Timezone:

ID:

6.5

Workpackage:

Case Study: Mathematics

Task leader:

jordi.saludes

Assignees:

jordi.saludes

Assignees:

Sebastian Xambo

Dependencies:

Commanding GF library

Status:

Planned

Integrate the commanding library into a component to transform the issued commands to the CAS.

Objects and properties GF library

Start: 0

Timezone:

ID:

6.6

Workpackage:

Case Study: Mathematics

Task leader:

jordi.saludes

Assignees:

jordi.saludes

Status:

Planned

Gf grammar library able to generate natural language sentences corresponding to objects and relations of the word problem. It must be able to parse simple questions related to the word problem domain into predicates. Depends on work package 2 and probably work package 4.

Module for semi-automatic reasoning

Start: 0

Timezone:

ID:

6.7

Workpackage:

Case Study: Mathematics

Task leader:

jordi.saludes

Assignees:

jordi.saludes

Status:

Planned

Automated reasoning is needed to assess the soundness of the model proposed by the student and to answer his/her questions. This requires adding small ontologies describing the word problem, including:

Data present in the problem statement;
Additional world knowledge to make reasoning possible.

Add State of the Art study here.

Set theory reasoning of the ducks and rabbits problem

Some time ago I managed to build a theory supporting the Farm problem in Isabelle/HOL (attached below)

I wasn't expecting such a toil but lack of detailed documentation and a wicked simplifier made my life miserable for a whole week.

Assumptions

It is based on 3 sets:

Being in the farm (farm)
Being a duck
Being a rabbit
and a function: is_leg_of : leg → animal.

As axioms, we have:

Ground knowledge axioms

Rabbits have 4 legs.
Ducks have 2 legs.

Problem axioms

All animals in the farm are rabbits or ducks.
There are 100 animals in the farm.
There are 260 legs in the farm

Extra axioms

That is, facts that are implicitely known but you need to state for Isabelle with Main theory to work:

Rabbits and ducks are finite
Rabbits and ducks are disjoint

Variables and equations

Let R be the number of rabbits in the farm and D the number of ducks in the farm. With the preceding axioms, we were able to produce Isabelle-certified proofs that

R + D = 100

and

2*D + 4*R = 260

and then deduce that R=30 and D=70.

Attachment	Size
Farm.thy	5.67 KB

Integration into dialog manager

Start: 0

Timezone:

ID:

6.8

Workpackage:

Case Study: Mathematics

Task leader:

jordi.saludes

Assignees:

jordi.saludes

Status:

Planned

In particular, objects will be annotated by natural language noun-phrases and equations by sentences. These annotations will be parsed into GF interlingua and will be used whenever language generation related to the problem was needed.

WP7: Case Study Patents

The work will start with the provision of user requirements (WP9) and the preparation of a parallel patent corpus (EPO) to fuel the training of statistical MT (UPC). In parallel UGOT will work on grammars covering the domain and subsequently, together with UPC, apply the hybrid (WP2, WP5) MT on abstracts and claims. Ontotext will provide semantic infrastructure with loaded existing structured data sets (WP4) from the patent domain (IPC, patent ontology, bio-medical and pharmaceutical knowledge bases, e.g. LLD). Based on the use case requirements, Ontotext will build a prototype (D7.1, D7.2) exposing multiple cross-lingual retrieval paradigms and MT of patent sections. The accuracy will be regularly evaluated through both automatic (e.g. BLEU scoring) and human based (e.g. TAUS) means (WP9).

Task List

The work package is split into 9 major tasks as follows:

Task 7.1 User Requirements and Scenarios (Task Lead: UPC)
Task 7.2 Patent corpora (Task Lead: UPC)
Task 7.3 Grammars for the patent domain (Task Lead: UGOT)
Task 7.4 Ontologies and document indexation (Task Lead: Ontotext)
Task 7.5 Prototype (Task Lead: Ontotext)
Task 7.6 SMT and Hybrid MT (Task Lead: UPC)
Task 7.7 Prototype (user interface) (Tas Lead by Ontotext)
Task 7.8 Human evaluation (Task Lead: TBD)
Task 7.9 Patent Case Study: Final Report (Task Lead: UPC)

Month 10-15 plan

Task 7.2 starts in M10 and is due to provide a first set of corpora at the end of M16. Final revision depends on the availability of the EPO data.
Task 7.3 starts in M10 and is due to provide a preliminary report at the end of M16.

Month 16-21 plan

Task 7.1 starts at M15 and is due to provide a preliminar version at the beginning of M17.
Task 7.3 will produce a more complete report by the beginning of M19.
Task 7.4 starts at M16 and is due to provide a description of the type of queries at the end of M16.
Task 7.5 starts at M16 and is due to provide a description of the Prototype architecture at the end of M16.
Task 7.6 starts along with WP5 and will produce a SMT baseline for the Patents prototype.
D7.1 deadline is M21.

User Requirements

Start: 1 Jun 2010

End: 1 Jul 2010

Timezone: Europe/Stockholm

ID:

7.1

Workpackage:

Case Study: Patents

Assignees:

aarne.ranta

Assignees:

meritxell.gonzalez

Status:

Completed

Timeframe:

May 2011 - Oct 2011

The patents case study comprises two basic scenarios: the online patent retrieval and the patent translation. In this prototype we tackle these two scenarios separately, as shown in Figure 1, even though they can be viewed as a unique multilingual patent retrieval paradigm. In the future, we plan to study how to automate the reciprocal inputs between the two processes, i.e. the annotation of translations and the translation of semantically annotated documents.

From a general perspective, two user roles may be defined in this case study: end-users looking for information related to the patents and editors adding new patent documents to a hypothetical repository.

Details are given in D71.

Patent Corpora

Start: 1 Jun 2010 15:17

Timezone: Europe/Vienna

ID:

7.2

Workpackage:

Case Study: Patents

Task leader:

meritxell.gonzalez

Assignees:

cristina.españa

Assignees:

lluis.marquez

Status:

Completed

Timeframe:

Jun 2011 - Oct 2012

Completed on:

30 November, 2012 - 23:00

Determining and gathering of bilingual and monolingual corpora for the patent case study.

SMT system is trained with te MAREC corpus (WP5).
EPO dataset is used for testing pourposes (WP5).
www-EPO dataset will be used to fill the retrieval databases (WP7)

Grammars for the patent domain

Start: 1 Aug 2010

Timezone: Europe/Stockholm

ID:

7.3

Workpackage:

Case Study: Patents

Task leader:

ramona.enache

Assignees:

aarne.ranta

Assignees:

ramona.enache

Status:

Ongoing

Timeframe:

Jan 2011 - Nov 2012

There are two subtasks here:

Grammars for translation of the patent documents.
Grammars for online-translation of CNL queries

Ontologies and Document Indexation

Start: 0

Timezone:

ID:

7.4

Workpackage:

Case Study: Patents

Task leader:

meritxell.gonzalez

Assignees:

borislav.popov

Assignees:

mariana.damova

Dependencies:

Grammars for the patent domain

Status:

Ongoing

Timeframe:

Jun 2011 - Oct 2012

Developing an ontology capturing the structure of patent documents; and indexing the patents documents according to the semantic knowledge.

Patents Retrieval System

Start: 1 Jul 2010

Timezone: Europe/Vienna

ID:

7.5

Workpackage:

Case Study: Patents

Task leader:

lluis.marquez

Assignees:

borislav.popov

Assignees:

milen.chechev

Assignees:

petar

Relevant Deliverables:

Patent Case Study Final Report

Relevant Deliverables:

Patent MT and Retrieval Prototype

Relevant Deliverables:

Patent MT and Retrieval Prototype Beta

Dependencies:

Grammars for the patent domain

Dependencies:

Ontologies and Document Indexation

Dependencies:

Patent Corpora

Status:

Completed

Timeframe:

Jun 2011 - Dec 2012

Contact @UPC: Lluis and Cristina

DEPENDENCIES:

TASK 1, 2, 3 and 4
WP4. Knowledge Engineering
TASK 8 (for final version of prototype)

Participants:

Ontotext,
UGOT,
UPC

Contact point @Ontotext: Borislav Popov

DEADLINES: Beta = M21; Final = M27

Machine Translation Systems

Start: 22 Mar 2010

Timezone: Europe/Vienna

ID:

7.6

Workpackage:

Case Study: Patents

Assignees:

aarne.ranta

Assignees:

cristina.españa

Assignees:

lluis.marquez

Assignees:

meritxell.gonzalez

Assignees:

ramona.enache

Relevant Deliverables:

Patent Case Study Final Report

Relevant Deliverables:

Patent MT and Retrieval Prototype

Relevant Deliverables:

Patent MT and Retrieval Prototype Beta

Dependencies:

Grammars for the patent domain

Status:

Completed

Timeframe:

Jan 2012 - Dec 2012

Completed on:

11 January, 2013 (All day)

Contact @UPC: Lluis and Cristina

DEPENDENCIES:

TASK 2, 3
WP5. A baseline of the WP5 system will be integrated in the prototype.

Patents abstracts and claim are translated using the baseline of the hybrid system.

Protoype (User Interface)

Start: 1 Dec 2010

End: 31 Oct 2011

Timezone: Europe/Vienna

ID:

7.7

Workpackage:

Case Study: Patents

Task leader:

borislav.popov

Assignees:

borislav.popov

Assignees:

cristina.españa

Assignees:

lluis.marquez

Assignees:

meritxell.gonzalez

Assignees:

milen.chechev

Relevant Deliverables:

Patent MT and Retrieval Prototype

Relevant Deliverables:

Patent MT and Retrieval Prototype Beta

Dependencies:

Machine Translation Systems

Dependencies:

Patents Retrieval System

Status:

Completed

Timeframe:

Jun 2011 - Sep 2012

DEPENDENCIES:

TASK 1
TASK 8 (for final version of prototype)

Participants:

Ontotext,
UGOT,
UPC

Contact point @Ontotext: Borislav Popov

DEADLINES: Beta = M21; Final = M27

Evaluations

Start: 1 Jun 2011

End: 30 Jun 2012

Timezone: Europe/Vienna

ID:

7.8

Workpackage:

Case Study: Patents

Assignees:

aarne.ranta

Assignees:

maria.mateva

Assignees:

meritxell.gonzalez

Relevant Deliverables:

Patent Case Study Final Report

Dependencies:

Grammars for the patent domain

Dependencies:

Ontologies and Document Indexation

Dependencies:

Patents Retrieval System

Dependencies:

Protoype (User Interface)

Status:

Planned

DEPENDENCIES:

TASK 5

Note: Deadlines have been delayed 3 months due to the WP delay.

DEADLINE: M31 (to allow for final report)

Subtasks

Preparation starts M19 (at the very latest)
Hiring translators
Producing guidelines for translators
Full evaluation starts at latest M28
- Evaluation will make use of the TAUS criteria

TAUS Evaluation Criteria:

Excellent (4):
- Accurately transfers all info; correct terminology, correct grammar. Understanding not improved by reading the source text.
Good (3):
- Contains minor mistakes; would not need to refer to source text to correct the mistakes.
Medium (2):
- Significant errors in output. Would need to read the source text to correct the errors.
Poor (1):
- Serious errors in output. Would need to read the source text to understand the output. Would probably need to retranslate from scratch.

WP8: Case Study: Cultural Heritage

The work is started by a study of the existing categorizations and metadata schemas adopted by the museum, as well as a corpus of texts in the current documentation which describe these objects (D8.1, UGOT and Ontotext). We will transform the CIDOC-CRM model into an ontology aligning it with the upper-level one in the base knowledge set (WP4) and modeling the museum object metadata as a domain specific knowledge base. Through the interoperability engine from WP4 and the IDE from WP2, we will semi-automatically create the translation grammar and further extend it (D8.2, UGOT, UHEL, UPC, Ontotext). The final result will be an online system enabling museum (virtual) visitors to use their language of preference to search for artefacts through semantic (structured) and natural language queries and examine information about them. We will also automatically generate a set of articles in the Wikipedia format describing museum artefacts in the 5 languages with extensive grammar coverage (D8.3, UGOT, Ontotext).

Links to Swedish museum databases who use the Carlotta system which is built upon the CIDOC-CRM model:

http://samsok.kmmuseum.se/

http://carlotta.gotlib.goteborg.se/pls/carlotta/welcome

http://www.tremil.se/pls/hborg/rigby.welcome

http://collections.smvk.se/pls/em/rigby.SokEnkel

WP9: User Requirements and Evaluation

Requirements for work

The work will start with collecting user requirements for the grammar development IDE (WP2), translation tools (WP3), and the use cases (WP6-8).

We will define the evaluation criteria and schedule in synchrony with the WP plans (D9.1). We will define and collect corpora including diagnostic and evaluation sets, the former, to improve translation quality on the way, and the latter to evaluate final results.

Corpus definitions

Each corpus available for MOLTO will be described by the providing project members.
The corpora will be split for development, diagnostic and evaluation use.
Contact persons will be named for questions and requests for each corpus.
Storage places and access protocols to gain this specified corpus data will be defined.

Description of end-user workflow

Translator's new role (parallel to WP3: Translator's tools) will be designed and described in the D9.1 deliverable. Most current translator's workbench software treat the original text as read-only source. The tools to be developed within WP3 (+ 2) will lead towards more mutable role of source text. The translation process will resemble more like structured document editing or multilingual authoring than transformation from a fixed source to a number of target languages.

We will only provide a basic infrastructure API for external translation workbenches and keep an eye on the "new multilingual translator's workflow".

Introduction of WP liaison persons and other contacts

For each work package, the liaison contact information and work progress will be kept up-to-date on the MOLTO web site. Our liaison person Mirka Hyvärinen will be in contact with other project members.

Also possibility to access UHEL's internal working wiki "MOLTO kitwiki" will be granted upon request to other project members.

Evaluation of results

Evaluation aims at both quality and usability aspects. UHEL will develop usability tests for the end-user human translator. The MOLTO-based translation workflow may differ from the traditional translator's workflow. This will be discussed in the D9.1 evaluation plan.

To measure the quality of MOLTO translations, we compare them to (i) statistical and symbolic machine translation (Google, SYSTRAN); and (ii) human professional translation. We will use both automatic metrics (IQmt and BLEU; see section 1.2.8 for details) and TAUS quality criteria (Translation Automation Users Society). As MOLTO is focused on information-faithful grammatically correct translation in special domains, TAUS results will probably be more important.

Given MOLTO's symbolic, grammar-based interlingual approach, scalability, portability and usability are important quality criteria for the translation results. For the translator's tools, user-friendliness will be a major aspect of the evaluation. These criteria are quantified in (D9.1) and reported in the final evaluation (D9.2).

In addition to the WP deliverables, there will be continuous evaluation and monitoring with internal status reports according to the schedule defined in D9.1.

WP10: Dissemination and Exploitation

Define workplan here

http://www.speaklike.com/products/ some examples of applications
http://www.sherpa.ac.uk/romeo/ Publisher copyright policies & self-archiving

Factorize the Food grammar

Start: 15 Apr 2010

Timezone: Europe/Stockholm

ID:

10.3

Workpackage:

Dissemination and Exploitation

Task leader:

aarne.ranta

Assignees:

aarne.ranta

Assignees:

olga.caprotti

Status:

Completed

Completed on:

26 March, 2010 - 01:00

Factorize the grammar used now for the demo fridge in modules that isolate the different kinds of phrases: eg. Comments, Greetings, Questions, etc. Check whether there are ontologies that describe these.

The factorization can be seen in the phrasebook example under /example/phrasebook.

MOLTO Phrasebook May Release

Start: 0

Timezone:

ID:

10.4

Workpackage:

Dissemination and Exploitation

Task leader:

aarne.ranta

Assignees:

aarne.ranta

Assignees:

krasimir.angelov

Assignees:

olga.caprotti

Assignees:

ramona.enache

Assignees:

thomas.hallgren

Dependencies:

Factorize the Food grammar

Status:

Completed

Timeframe:

Apr 2010 - May 2010

Completed on:

1 June, 2010 (All day)

The MOLTO Phrasebook is a web application for the traveler, eventually it will be a phone application (for the Android). It consists of frequently used phrases that a foreigner might want to use when abroad.

Features of May Release

fixes and additions in RGL
data collection from
- wikipedia phrasebook
- wiki page for collection
web server
web GUI
Android GUI
structured and customizable release (e.g. choose the language pair)
agree on base abstract syntax
Android stand-alone
complete remaining concretes
- examples + native informant
feedback button
- current state info
- spam issue
unlexing
lexing &+
disambiguation
deletion
history
gr by shaking

demo preview: http://tournesol.cs.chalmers.se/~aarne/phrasebook/phrasebook.html

Investigate the Internationalization API of Facebook

Start: 0

Timezone:

ID:

10.6

Workpackage:

Dissemination and Exploitation

Task leader:

olga.caprotti

Assignees:

johnj.camilleri

Assignees:

Kaarel.Kaljurand

Assignees:

krasimir.angelov

Assignees:

thomas.hallgren

Relevant Deliverables:

GF Grammar Compiler API

Dependencies:

Designing the API and writing its documentation

Status:

Planned

Timeframe:

Mar 2012 - Apr 2013

The current GF Grammar Compiler API is providing translation services that can be called on-the-fly. The goal of this task is to find out how to integrate them to an existing API where there is a need for Internationalization, example Facebook https://developers.facebook.com/docs/internationalization.

The image shows how translations are entered manually in the current version. My guess is that we could improve on that.

Anther example is the situation of commonly used sentences: "Happy birthday", we have on our Travel Phrasebook, we do not have Portuguese, we could friends-source it :) but how? Give them a FB app?

Love to see some comments on this.

BTW, I am not partial to FB, you can check any social network of your liking that provides an Internationalization API. This is a test of concept also looking for CNLs in the wild :)

WP11: Multilingual semantic wiki - AceWiki

Introduction

The core of WP11 is an existing wiki system AceWiki which is going to be developed into a multilingual controlled natural language wiki system within the MOLTO project.

Important links

The AceWiki homepage (http://attempto.ifi.uzh.ch/acewiki/) contains:

demo wikis
list of related publications
further links

AceWiki development is hosted on GitHub (https://github.com/AceWiki/AceWiki)

Contributions to the AceWiki codebase can be made via GitHub pull requests (see http://help.github.com/send-pull-requests/)
Bugs can be reported via the AceWiki Issues list (https://github.com/AceWiki/AceWiki/issues)
Documentation and brainstorming pages are on the AceWiki project wiki (https://github.com/AceWiki/AceWiki/wiki)

Tasks

Task 11.1: Make the AceWiki design multilingual and implement a small AceWiki engine for multilingual GF grammars
Task 11.2: General refactoring of the AceWiki code

Meetings

13 December 2011

To do

Phase 1

AceWiki side:

Refactor AceWiki to support different implementations of predictive parsers (at the moment, AceWiki's chartparser is hardwired) [done: Tobias]
Extract from the package "aceowl" everything that should be reused in GFAceWiki into a new package (mostly OWL-related stuff) [done: Tobias]
Connect AceWiki to GF (via JPGF) [done: Tobias]
Change the AceWiki architecture to support multilinguality [done: Tobias]
Implement a simple AceWiki language engine for multilingual GF grammars (as an alternative to the current ACE-OWL engine) [done: Tobias]

GF side:

Check the following three options of getting a multilingual OWL-capable ACE-implementation in GF (there might be more options):
- Start with the official GF implementation of ACE [assigned: Kaarel], see: https://github.com/Attempto/ACE-in-GF
- Start with Normunds' GF implementation of ACE [assigned: Kaarel] (discontinued), see: https://github.com/Kaljurand/AceWiki/tree/normunds/grammars/aceowl/normunds
- Create a GF implementation based on (parts of) the AceWiki grammar
Create a stable multilingual OWL-capable ACE-implementation in GF

Phase 2

Implement support for adding/changing/deleting words
Implement more languages
Choose application domain(s) and build exemplary knowledge base
Perform user studies

Releases

Release notes: https://raw.github.com/AceWiki/AceWiki/master/CHANGES.txt

2012-01-05: v0.5.2

Unsorted

AceWiki as a webservice

See also the thread starting with: https://lists.ifi.uzh.ch/pipermail/attempto/2011-December/000818.html

There could be potentially multiple front-ends
- the current front-end
- an existing GF front-end
- Emacs
- Unix commandline
- ACE View
- native Android/iOS app
- ...
REST API
- allows to easily push content (existing lexicon, existing ACE text) into the wiki
- should support GET as much as possible
Make AceWiki easily deployable on hosting providers such as Google App Engine
- reasons: speed, reliability, etc.
- for that to work, AceWiki should be completely in Java (i.e. no using of APELocal), i.e. ape.exe would still have to be on a different host (because it is in Prolog)
- would reasoning performance profit in a major way?

AceWiki more configurable

[done: Changyuan/Kaarel] Support APESocket and APEWebservice in addition to APELocal. Configuration at the XML file level. See also: http://attempto.ifi.uzh.ch/site/docs/java/ch/uzh/ifi/attempto/ape/ACEPar...

AceWiki Refactoring

Start: 0

Timezone:

ID:

11.2

Workpackage:

Multilingual semantic wiki

Assignees:

Kaarel.Kaljurand

Assignees:

Tobias.Kuhn

Relevant Deliverables:

Multilingual semantic wiki

Status:

Completed

Timeframe:

Dec 2011 - Jan 2012

Completed on:

5 January, 2012 (All day)

General refactoring and clean-up of the AceWiki code.

Multilingual AceWiki

Start: 0

Timezone:

ID:

11.1

Workpackage:

Multilingual semantic wiki

Assignees:

Tobias.Kuhn

Relevant Deliverables:

Multilingual semantic wiki

Status:

Completed

Timeframe:

Dec 2011 - Jan 2012

Completed on:

5 January, 2012 (All day)

Make the AceWiki design multilingual and implement a small AceWiki engine for multilingual GF grammars.