Evaluation of WP6
The following will describe a way to create and maintain a treebank of mgl productions in several languages in order to improve the quality of the library, make possible to have regression tests and track the progress of development.
First, we will describe the structure of the treebank. Next, we will propose a protocol from creating and maintaining it and, third, we will describe the treebank manager tool.
Structure
A Treebank has:
- A path to a grammar pgf file.
- A collection of treebank entries.
A Treebank entry consists of:
- An abstract tree for the gf grammar corresponding to the pgf file above.
- For each language (encoded as ISO 3 letters code), one or more Changesets.
A Changeset has:
- source: the person submitting it;
- revision: an integer equal to the svn revision in which this item is committed
- concrete: the proposed linearization and optionally a comment.
A defect is a difference between the actual linearization of an entry and the sample in the last change-set.
Protocol
- Using the gr command, create a list of abstract trees
- Refine this list by removing or modifying unnatural productions (too deep, too long, too meaningless);
- Add linearizations for all targeted languages: this makes the initial change-set;
- Send the pairs (abstract tree, L linearization) to a fluent speaker of language L and ask for corrections;
- Add the corrections to the treebank as new change-sets.
- Generate a list of defects and tackle them
- Generate new linearizations, and go to step 4. Cycle until satisfied or out of resources.
Manager tool
We need an interactive tool able to:
- Add new entries to the treebank by providing the abstract tree (automatically generate the initial changeset from it)
- Interactively amending of abstract trees (tree editor)
- Compare change-set samples with actual linearization.
- Generate (tree, concrete) pairs as in step 4 above for a given language.
- Add corrected concretes to the corresponding treebank entry as a new change-set.
- List all defects in the treebank for a given language.
- Summarize the status of these defects (see tickets below)
What links here
No backlinks found.