Submission #42

Submission information
Kaarel.Kaljurand's picture
Submitted by Kaarel.Kaljurand
Monday, 21 January, 2013 - 13:15
87.64.179.118
Basic information

Which grammar you have written? Which language(s)? Rate your GF and language skills with the following scale:

Language skills

0 : no skills 1 : passive knowledge 2 : fluent non-native 3 : native speaker

GF skills

0 : no skills 1 : basic skills (2-day GF tutorial) 2 : medium skills (previous experience of similar task) 3 : advanced skills (resource grammar writer/substantial contributor)

Language skills: Estonian (3), English (2), German (1)

GF skills: medium skills (2)

I've been working on the ACE-in-GF grammar, see: https://github.com/Attempto/ACE-in-GF
Development tools
Text editor + GF shell
Vim, Make, various Unix command line tools, Ubuntu

Grammar size: 15 languages, 50 cats, 100 funs (excluding the lexicon)
Diagnostic tools
verbose (import -v FILE)
  • print_grammar (pg)
  • pg -words
  • pg -missing
  • compute_concrete (cc)
Testing
  • ...with a shellscript?
  • ...manually?
  • Random generation
  • Other way of grammar testing
Used an existing AceWiki treebank that can be parsed with AceWiki's own Codeco parser. Tried to parse this with the ACE-in-GF grammar.
1. Exhaustive generation based on increasing sentence length obtained by completion. First removed less interesting parts of the grammar to make the output manageable. (This is still work in progress.)

2. Roundtripping to discover ambiguities, i.e. for each tree in a treebank, linearize+parse it with all the languages. Highlight cases where "linearize+parse" produces new trees, where highlight == linearize the original tree and the new trees in a disambiguation language (e.g. ACE). This helps to discover errors like:

if X sees somebody who sees Y ... -> if X sees somebody who Y sees ... (word order changes, in Ger and Dut)
RGL
  • RGL synopsis document
  • RGL source browser
  • Download the source files and read them
  • ...because other documentation was not helpful
Grammar writing
  • Test corpus of sentences you want to express in the grammar
  • ...and used the test corpus as a treebank later on
  • Abstract syntax based on an existing formal system
AceWiki subset of ACE
Paragraph or more
In ACE, the basic unit is a paragraph, which consists of anaphorically linked sentences. Even though the ACE-in-GF grammar does not model the anaphoric links, it still makes sense to have a Paragraph category to combine the sentences.
No semantic control in abstract syntax
  • Base module and domain-specific extensions
  • Common constructions with a functor
lincats only from recommended RGL API categories
Other comments

If you have any other comments, for which there wasn't space, please comment here.

The work on ACE-in-GF is documented in D11.1. The latest version of the grammar, test scripts, and a more recent version of the deliverable can be found on https://github.com/Attempto/ACE-in-GF