The grammar developer's tools are divided to two kinds of tasks:
GF grammar compiler API
actual tools implemented by using the API
The workplan for the first six months concerns mostly the API, with the main actual tool being the GF Shell, which is a line-based grammar development tool. It is a powerful tool since it enables scripting, but it is not integrated with other working environments. The most important other environment will be web-based access to the grammar compiler.
Note that most discussions on GF are public at http://code.google.com/p/grammatical-framework/.
Here follows the work plan, with tasks assigned to sites and approximate months.
Documentation of GF is hosted on Google Code at http://code.google.com/p/grammatical-framework/
There is a wiki cover page for the Resource Grammar Library API and an online version at http://www.grammaticalframework.org/compiler-api/.
The GF API design will take into account the following requirements:
The documentation is being hosted at the GF website.
What we mean by example based grammar writing.
Current status is proof of concept: it is possible to load example based grammar and to compile it.
Need to do: - ....
The runtime is the part of the GF system that implements parsing and linearization of texts based on a PGF grammar that has been produced by the GF compiler.
The standard GF runtime is written in Haskell like the rest of the system. Unfortunately this results in a large memory footprint and possibly also portability problems, which preclude its use in certain applications.
The goal of the current task is to reimplement the GF runtime as a pure C library. This C library can then hopefully be used in some situations where the Haskell-based runtime would be unwieldy.
Preview versions of the implementation, libpgf
, are available from the project home page. This is also where up-to-date documentation can be found.
The compiler API must be used by the morphology server.
To develop a python plugin for gf (based on the planned C plugin) and connect it to relevant parts of the Natural Language Toolkit (http://www.nltk.org/)
2.8.1 Develop python bindings to gf.
2.8.2 nltk integration.
This is how to use some of the functionalities of the GF shell inside Python.
Due to some ghc glitch, it only builds on Linux.
You'll need the source distribution of GF, ghc and the Python development files1. Then, go to the python bindings folder and build it:
cd GF/contrib/py-bindings
make
It will build a shared library (gf.so
) that you can import and use into Python as shown below.
To test if it works correctly, type:
python -m doctest example.rst
First you must import the library:
% import gf
then load a PGF file, like this tiny example:
% pgf = gf.read_pgf("Query.pgf")
We could ask for the supported languages:
% pgf.languages()
[QueryEng, QuerySpa]
The start category of the PGF module is:
% pgf.startcat()
Question
Let's us save the languages for later:
% eng,spa = pgf.languages()
These are opaque objects, not strings:
% type(eng)
(type 'gf.lang')
and must be used when parsing:
% pgf.parse(eng, "is 42 prime")
[Prime (Number 42)]
Yes, I know it should have a '?' at the end, but there is not support for other lexers at this time.
Notice that parsing returns a list of gf trees. Let's save it and linearize it in Spanish:
% t = pgf.parse(eng, "is 42 prime")
% pgf.linearize(spa, t[0])
'42 es primo'
(which is not, but there is a '?' lacking at the end, remember?)
One of the good things of the GF shell is that it suggests you which tokens can continue the line you are composing.
We got this also in the bindings. Suppose we have no idea on how to start:
% pgf.complete(eng, "")
['is']
so, there is only a sensible thing to put in. Let's continue:
% pgf.complete(eng, "is ")
[]
Is it important to note the blank space at the end, otherwise we get it again:
% pgf.complete(eng, "is")
['is']
But, how come that nothing is suggested at "is "? At the current point, a literal integer is expected, so GF would have to present an infinite list of alternatives. I cannot blame it for refusing to do so.
% pgf.complete(eng, "is 42 ")
['even', 'odd', 'prime']
Good. I will go for 'even', just to be in the safe side:
% pgf.complete(eng, "is 42 even ")
[]
Nothing again, but this time the phrase is complete. Let us check it by parsing:
% pgf.parse(eng, "is 42 even")
[Even (Number 42)]
We store the last result and ask for its type:
% t = pgf.parse(eng, "is 42 even")[0]
% type(t)
(type 'gf.tree')
What's inside this tree? We use unapply
for that:
% t.unapply()
[Even, Number 42]
This method returns a list with the head of the fun judgement and its arguments:
% map(type, _)
[(type 'gf.cid'), (type 'gf.expr')]
Notice the argument is again a tree (gf.tree
or gf.expr
, it is all
the same here.)
% t.unapply()[1]
Number 42
We will repeat the trick with it now:
% t.unapply()[1].unapply()
[Number, 42]
and again, the same structure shows up:
% map(type, _)
[(type 'gf.cid'), (type 'gf.expr')]
One more time, just to get to the bottom of it:
% t.unapply()[1].unapply()[1].unapply()
42
but now it is an actual number:
% type(_)
(type 'int')
We ended with a full decomposed fun judgement.
In Ubuntu I got it by installing the package python-all-dev
. ↩
Here a slighly better description with eventually relevant links to sw, documentation etc.
Major features:
New languages:
Web-based tools for grammarians: http://www.grammaticalframework.org/demos/gfse/
Ongoing work at http://cloud.grammaticalframework.org.
Look into online IDE platforms, like Kodingen and CodeRun.
There is work for Ajax-based code editors, eg Ymacs, which could be useful since there is a GF mode for emacs already (where?).
The emacs mode can now be found in http://www.grammaticalframework.org/src/tools/gf.el (note by Aarne)
There is also a Mozilla project, Bespin, to build a web-based editor extensible by javascript.
Also - check Orc, yet another online IDE for a new language, using CodeMirror as editor.
Design and intergrate probabilistic features to GF and PGF.
Extend planning here.
Finale phase of the work planned in this workpackage. Exact scheduling to be defined.
Adding the possibility to dynamically add new words to lexicons "linked" in compiled grammars.