WP2: Grammar Developer’s Tools

The grammar developer's tools are divided to two kinds of tasks:

GF grammar compiler API
actual tools implemented by using the API

The workplan for the first six months concerns mostly the API, with the main actual tool being the GF Shell, which is a line-based grammar development tool. It is a powerful tool since it enables scripting, but it is not integrated with other working environments. The most important other environment will be web-based access to the grammar compiler.

Note that most discussions on GF are public at http://code.google.com/p/grammatical-framework/.

Here follows the work plan, with tasks assigned to sites and approximate months.

Improving the Resource Grammar Library API and its documentation

Start: 0

Timezone:

ID:

2.10

Workpackage:

Grammar Developer’s Tools

Task leader:

aarne.ranta

Assignees:

ramona.enache

Relevant Deliverables:

GF Grammar Compiler API

Dependencies:

Designing the API and writing its documentation

Status:

Ongoing

Timeframe:

Mar 2010

Documentation of GF is hosted on Google Code at http://code.google.com/p/grammatical-framework/

There is a wiki cover page for the Resource Grammar Library API and an online version at http://www.grammaticalframework.org/compiler-api/.

Designing the API and writing its documentation

Start: 0

Timezone:

ID:

2.5

Workpackage:

Grammar Developer’s Tools

Task leader:

aarne.ranta

Assignees:

krasimir.angelov

Status:

Completed

Timeframe:

Mar 2010 - Aug 2010

Completed on:

1 October, 2011 (All day)

The GF API design will take into account the following requirements:

programming environment eg Eclipse, XCode, NotePad++, Web etc
standard formats for I/O
....

The documentation is being hosted at the GF website.

Example-based grammar writing

Start: 0

Timezone:

ID:

2.2

Workpackage:

Grammar Developer’s Tools

Task leader:

ramona.enache

Assignees:

aarne.ranta

Relevant Deliverables:

Grammar IDE

Status:

Ongoing

Timeframe:

Jul 2010 - Sep 2010

What we mean by example based grammar writing.

Current status is proof of concept: it is possible to load example based grammar and to compile it.

Need to do: - ....

GF runtime in C

Start: 0

Timezone:

ID:

2.11

Workpackage:

Grammar Developer’s Tools

Task leader:

lauri.alanko

Assignees:

jordi.saludes

Assignees:

krasimir.angelov

Assignees:

lauri.alanko

Status:

Ongoing

Timeframe:

Apr 2010

Overview

The runtime is the part of the GF system that implements parsing and linearization of texts based on a PGF grammar that has been produced by the GF compiler.

The standard GF runtime is written in Haskell like the rest of the system. Unfortunately this results in a large memory footprint and possibly also portability problems, which preclude its use in certain applications.

The goal of the current task is to reimplement the GF runtime as a pure C library. This C library can then hopefully be used in some situations where the Haskell-based runtime would be unwieldy.

Status

Preview versions of the implementation, libpgf, are available from the project home page. This is also where up-to-date documentation can be found.

Morphology server and its API

Start: 0

Timezone:

ID:

2.3

Workpackage:

Grammar Developer’s Tools

Task leader:

aarne.ranta

Assignees:

aarne.ranta

Assignees:

krasimir.angelov

Relevant Deliverables:

Grammar IDE

Status:

Planned

Timeframe:

Aug 2010 - Oct 2010

The compiler API must be used by the morphology server.

Plugin to Python NLTK

Start: 0

Timezone:

ID:

2.8

Workpackage:

Grammar Developer’s Tools

Task leader:

jordi.saludes

Assignees:

jordi.saludes

Status:

Completed

To develop a python plugin for gf (based on the planned C plugin) and connect it to relevant parts of the Natural Language Toolkit (http://www.nltk.org/)

Subtasks

2.8.1 Develop python bindings to gf.

2.8.2 nltk integration.

GF python bindings

Using the GF python bindings

This is how to use some of the functionalities of the GF shell inside Python.

Installation

Due to some ghc glitch, it only builds on Linux.

You'll need the source distribution of GF, ghc and the Python development files¹. Then, go to the python bindings folder and build it:

 cd GF/contrib/py-bindings
 make

It will build a shared library (gf.so) that you can import and use into Python as shown below.

Testing installation

To test if it works correctly, type:

 python -m doctest example.rst

Examples

Loading a pgf file

First you must import the library:

% import gf

then load a PGF file, like this tiny example:

% pgf = gf.read_pgf("Query.pgf")

We could ask for the supported languages:

% pgf.languages()
[QueryEng, QuerySpa]

The start category of the PGF module is:

% pgf.startcat()
Question

Parsing and linearizing

Let's us save the languages for later:

% eng,spa = pgf.languages()

These are opaque objects, not strings:

% type(eng) 
(type 'gf.lang')

and must be used when parsing:

% pgf.parse(eng, "is 42 prime") 
[Prime (Number 42)]

Yes, I know it should have a '?' at the end, but there is not support for other lexers at this time.

Notice that parsing returns a list of gf trees. Let's save it and linearize it in Spanish:

% t = pgf.parse(eng, "is 42 prime")
% pgf.linearize(spa, t[0])
'42 es primo'

(which is not, but there is a '?' lacking at the end, remember?)

Getting parsing completions

One of the good things of the GF shell is that it suggests you which tokens can continue the line you are composing.

We got this also in the bindings. Suppose we have no idea on how to start:

% pgf.complete(eng, "")
['is']

so, there is only a sensible thing to put in. Let's continue:

% pgf.complete(eng, "is ")
[]

Is it important to note the blank space at the end, otherwise we get it again:

% pgf.complete(eng, "is")
['is']

But, how come that nothing is suggested at "is "? At the current point, a literal integer is expected, so GF would have to present an infinite list of alternatives. I cannot blame it for refusing to do so.

% pgf.complete(eng, "is 42 ")
['even', 'odd', 'prime']

Good. I will go for 'even', just to be in the safe side:

% pgf.complete(eng, "is 42 even ")
[]

Nothing again, but this time the phrase is complete. Let us check it by parsing:

% pgf.parse(eng, "is 42 even")
[Even (Number 42)]

Deconstructing gf trees

We store the last result and ask for its type:

% t = pgf.parse(eng, "is 42 even")[0]
% type(t)
(type 'gf.tree')

What's inside this tree? We use unapply for that:

% t.unapply()
[Even, Number 42]

This method returns a list with the head of the fun judgement and its arguments:

% map(type, _)
[(type 'gf.cid'), (type 'gf.expr')]

Notice the argument is again a tree (gf.tree or gf.expr, it is all the same here.)

% t.unapply()[1]
Number 42

We will repeat the trick with it now:

% t.unapply()[1].unapply()
[Number, 42]

and again, the same structure shows up:

% map(type, _)
[(type 'gf.cid'), (type 'gf.expr')]

One more time, just to get to the bottom of it:

% t.unapply()[1].unapply()[1].unapply()
42

but now it is an actual number:

% type(_)
(type 'int')

We ended with a full decomposed fun judgement.

In Ubuntu I got it by installing the package python-all-dev. ↩

Refactoring the grammar compiler code base (to improve reusability)

Start: 0

Timezone:

ID:

2.1

Workpackage:

Grammar Developer’s Tools

Task leader:

aarne.ranta

Assignees:

krasimir.angelov

Status:

Assigned

Timeframe:

Mar 2010 - Jul 2010

Here a slighly better description with eventually relevant links to sw, documentation etc.

Release of GF 3.2

Start: 0

Timezone:

ID:

2.4

Workpackage:

Grammar Developer’s Tools

Task leader:

aarne.ranta

Assignees:

aarne.ranta

Assignees:

krasimir.angelov

Status:

Completed

Timeframe:

Mar 2010 - Dec 2010

Completed on:

15 December, 2012 (All day)

Major features:

pgf format is updated and documented in the wiki
completed runtime type checker for dependent types
parsing with dependent types
on-line parsing
type-error reporting
exhaustive generation of ASTs (also via lambda prolog)
probabilities in the abstract syntax
random generation guided by probability
parse results ranked by probability
example based grammar generation (extra script)

New languages:

Urdu, complete resource grammar
Turkish, complete morphology
Amharic, complete resource grammar
Punjabi, complete morphology

Web-based grammar development environment (version 1)

Start: 0

Timezone:

ID:

2.6

Workpackage:

Grammar Developer’s Tools

Task leader:

aarne.ranta

Assignees:

krasimir.angelov

Assignees:

thomas.hallgren

Relevant Deliverables:

Grammar IDE

Dependencies:

Release of GF 3.2

Status:

Ongoing

Timeframe:

Aug 2010 - Mar 2011

Prototype

Web-based tools for grammarians: http://www.grammaticalframework.org/demos/gfse/

Ongoing work at http://cloud.grammaticalframework.org.

Similar work

Look into online IDE platforms, like Kodingen and CodeRun.

There is work for Ajax-based code editors, eg Ymacs, which could be useful since there is a GF mode for emacs already (where?).

The emacs mode can now be found in http://www.grammaticalframework.org/src/tools/gf.el (note by Aarne)

There is also a Mozilla project, Bespin, to build a web-based editor extensible by javascript.

Also - check Orc, yet another online IDE for a new language, using CodeMirror as editor.

Integrating probabilities in GF and PGF

Start: 0

Timezone:

ID:

2.8

Workpackage:

Grammar Developer’s Tools

Task leader:

aarne.ranta

Assignees:

aarne.ranta

Status:

Planned

Timeframe:

Oct 2010 - Dec 2010

Design and intergrate probabilistic features to GF and PGF.

Extend planning here.

Integration with ontology tools

Start: 0

Timezone:

ID:

2.9

Workpackage:

Grammar Developer’s Tools

Task leader:

aarne.ranta

Assignees:

aarne.ranta

Assignees:

borislav.popov

Assignees:

lauri.carlson

Status:

Planned

Finale phase of the work planned in this workpackage. Exact scheduling to be defined.

On-line extension of PGF with new words

Start: 0

Timezone:

ID:

2.7

Workpackage:

Grammar Developer’s Tools

Task leader:

aarne.ranta

Assignees:

krasimir.angelov

Status:

Planned

Timeframe:

Aug 2010 - Jan 2011

Adding the possibility to dynamically add new words to lexicons "linked" in compiled grammars.