WP2: Grammar Developer’s Tools

The grammar developer's tools are divided to two kinds of tasks:

  • GF grammar compiler API

  • actual tools implemented by using the API

The workplan for the first six months concerns mostly the API, with the main actual tool being the GF Shell, which is a line-based grammar development tool. It is a powerful tool since it enables scripting, but it is not integrated with other working environments. The most important other environment will be web-based access to the grammar compiler.

Note that most discussions on GF are public at http://code.google.com/p/grammatical-framework/.

Here follows the work plan, with tasks assigned to sites and approximate months.

Improving the Resource Grammar Library API and its documentation

0
ID: 
2.10
Task leader: 
aarne.ranta
Assignees: 
ramona.enache
Relevant Deliverables: 
GF Grammar Compiler API
Status: 
Ongoing
Timeframe: 
Mar 2010

Documentation of GF is hosted on Google Code at http://code.google.com/p/grammatical-framework/

There is a wiki cover page for the Resource Grammar Library API and an online version at http://www.grammaticalframework.org/compiler-api/.

Designing the API and writing its documentation

0
ID: 
2.5
Task leader: 
aarne.ranta
Assignees: 
krasimir.angelov
Status: 
Completed
Timeframe: 
Mar 2010 - Aug 2010
Completed on: 
1 October, 2011 (All day)

The GF API design will take into account the following requirements:

  • programming environment eg Eclipse, XCode, NotePad++, Web etc
  • standard formats for I/O
  • ....

The documentation is being hosted at the GF website.

Example-based grammar writing

0
ID: 
2.2
Task leader: 
ramona.enache
Assignees: 
aarne.ranta
Relevant Deliverables: 
Grammar IDE
Status: 
Ongoing
Timeframe: 
Jul 2010 - Sep 2010

What we mean by example based grammar writing.

Current status is proof of concept: it is possible to load example based grammar and to compile it.

Need to do: - ....

GF runtime in C

0
ID: 
2.11
Task leader: 
lauri.alanko
Assignees: 
jordi.saludes
Assignees: 
krasimir.angelov
Assignees: 
lauri.alanko
Status: 
Ongoing
Timeframe: 
Apr 2010

Overview

The runtime is the part of the GF system that implements parsing and linearization of texts based on a PGF grammar that has been produced by the GF compiler.

The standard GF runtime is written in Haskell like the rest of the system. Unfortunately this results in a large memory footprint and possibly also portability problems, which preclude its use in certain applications.

The goal of the current task is to reimplement the GF runtime as a pure C library. This C library can then hopefully be used in some situations where the Haskell-based runtime would be unwieldy.

Status

Preview versions of the implementation, libpgf, are available from the project home page. This is also where up-to-date documentation can be found.

Morphology server and its API

0
ID: 
2.3
Task leader: 
aarne.ranta
Assignees: 
aarne.ranta
Assignees: 
krasimir.angelov
Relevant Deliverables: 
Grammar IDE
Status: 
Planned
Timeframe: 
Aug 2010 - Oct 2010

The compiler API must be used by the morphology server.

Plugin to Python NLTK

0
ID: 
2.8
Task leader: 
jordi.saludes
Assignees: 
jordi.saludes
Status: 
Completed

To develop a python plugin for gf (based on the planned C plugin) and connect it to relevant parts of the Natural Language Toolkit (http://www.nltk.org/)

Subtasks

2.8.1 Develop python bindings to gf.

2.8.2 nltk integration.

GF python bindings

Using the GF python bindings

This is how to use some of the functionalities of the GF shell inside Python.

Installation

Due to some ghc glitch, it only builds on Linux.

You'll need the source distribution of GF, ghc and the Python development files1. Then, go to the python bindings folder and build it:

 cd GF/contrib/py-bindings
 make

It will build a shared library (gf.so) that you can import and use into Python as shown below.

Testing installation

To test if it works correctly, type:

 python -m doctest example.rst

Examples

Loading a pgf file

First you must import the library:

% import gf

then load a PGF file, like this tiny example:

% pgf = gf.read_pgf("Query.pgf")

We could ask for the supported languages:

% pgf.languages()
[QueryEng, QuerySpa]

The start category of the PGF module is:

% pgf.startcat()
Question

Parsing and linearizing

Let's us save the languages for later:

% eng,spa = pgf.languages()

These are opaque objects, not strings:

% type(eng) 
(type 'gf.lang')

and must be used when parsing:

% pgf.parse(eng, "is 42 prime") 
[Prime (Number 42)]

Yes, I know it should have a '?' at the end, but there is not support for other lexers at this time.

Notice that parsing returns a list of gf trees. Let's save it and linearize it in Spanish:

% t = pgf.parse(eng, "is 42 prime")
% pgf.linearize(spa, t[0])
'42 es primo'

(which is not, but there is a '?' lacking at the end, remember?)

Getting parsing completions

One of the good things of the GF shell is that it suggests you which tokens can continue the line you are composing.

We got this also in the bindings. Suppose we have no idea on how to start:

% pgf.complete(eng, "")
['is']

so, there is only a sensible thing to put in. Let's continue:

% pgf.complete(eng, "is ")
[]

Is it important to note the blank space at the end, otherwise we get it again:

% pgf.complete(eng, "is")
['is']

But, how come that nothing is suggested at "is "? At the current point, a literal integer is expected, so GF would have to present an infinite list of alternatives. I cannot blame it for refusing to do so.

% pgf.complete(eng, "is 42 ")
['even', 'odd', 'prime']

Good. I will go for 'even', just to be in the safe side:

% pgf.complete(eng, "is 42 even ")
[]

Nothing again, but this time the phrase is complete. Let us check it by parsing:

% pgf.parse(eng, "is 42 even")
[Even (Number 42)]

Deconstructing gf trees

We store the last result and ask for its type:

% t = pgf.parse(eng, "is 42 even")[0]
% type(t)
(type 'gf.tree')

What's inside this tree? We use unapply for that:

% t.unapply()
[Even, Number 42]

This method returns a list with the head of the fun judgement and its arguments:

% map(type, _)
[(type 'gf.cid'), (type 'gf.expr')]

Notice the argument is again a tree (gf.tree or gf.expr, it is all the same here.)

% t.unapply()[1]
Number 42

We will repeat the trick with it now:

% t.unapply()[1].unapply()
[Number, 42]

and again, the same structure shows up:

% map(type, _)
[(type 'gf.cid'), (type 'gf.expr')]

One more time, just to get to the bottom of it:

% t.unapply()[1].unapply()[1].unapply()
42

but now it is an actual number:

% type(_)
(type 'int')

We ended with a full decomposed fun judgement.


  1. In Ubuntu I got it by installing the package python-all-dev

Refactoring the grammar compiler code base (to improve reusability)

0
ID: 
2.1
Task leader: 
aarne.ranta
Assignees: 
krasimir.angelov
Status: 
Assigned
Timeframe: 
Mar 2010 - Jul 2010

Here a slighly better description with eventually relevant links to sw, documentation etc.

Release of GF 3.2

0
ID: 
2.4
Task leader: 
aarne.ranta
Assignees: 
aarne.ranta
Assignees: 
krasimir.angelov
Status: 
Completed
Timeframe: 
Mar 2010 - Dec 2010
Completed on: 
15 December, 2012 (All day)

Major features:

  • pgf format is updated and documented in the wiki
  • completed runtime type checker for dependent types
  • parsing with dependent types
  • on-line parsing
  • type-error reporting
  • exhaustive generation of ASTs (also via lambda prolog)
  • probabilities in the abstract syntax
  • random generation guided by probability
  • parse results ranked by probability
  • example based grammar generation (extra script)

New languages:

  • Urdu, complete resource grammar
  • Turkish, complete morphology
  • Amharic, complete resource grammar
  • Punjabi, complete morphology

Web-based grammar development environment (version 1)

0
ID: 
2.6
Task leader: 
aarne.ranta
Assignees: 
krasimir.angelov
Assignees: 
thomas.hallgren
Relevant Deliverables: 
Grammar IDE
Dependencies: 
Release of GF 3.2
Status: 
Ongoing
Timeframe: 
Aug 2010 - Mar 2011

Prototype

Web-based tools for grammarians: http://www.grammaticalframework.org/demos/gfse/

Ongoing work at http://cloud.grammaticalframework.org.

Similar work

Look into online IDE platforms, like Kodingen and CodeRun.

There is work for Ajax-based code editors, eg Ymacs, which could be useful since there is a GF mode for emacs already (where?).

The emacs mode can now be found in http://www.grammaticalframework.org/src/tools/gf.el (note by Aarne)

There is also a Mozilla project, Bespin, to build a web-based editor extensible by javascript.

Also - check Orc, yet another online IDE for a new language, using CodeMirror as editor.

Integrating probabilities in GF and PGF

0
ID: 
2.8
Task leader: 
aarne.ranta
Assignees: 
aarne.ranta
Status: 
Planned
Timeframe: 
Oct 2010 - Dec 2010

Design and intergrate probabilistic features to GF and PGF.

Extend planning here.

Integration with ontology tools

0
ID: 
2.9
Task leader: 
aarne.ranta
Assignees: 
aarne.ranta
Assignees: 
borislav.popov
Assignees: 
lauri.carlson
Status: 
Planned

Finale phase of the work planned in this workpackage. Exact scheduling to be defined.

On-line extension of PGF with new words

0
ID: 
2.7
Task leader: 
aarne.ranta
Assignees: 
krasimir.angelov
Status: 
Planned
Timeframe: 
Aug 2010 - Jan 2011

Adding the possibility to dynamically add new words to lexicons "linked" in compiled grammars.