X.1 Grammars
The impact of MOLTO is not about just individual use cases. During the 3 years of the project, we have developed methods of efficient grammar writing, dividing the task such that grammar experts and domain experts get to do what they can best. These guidelines are documented in D2.3, Best practices.
Best practices document was published in October 2012, but many of the grammars are written before that. Here is first an overview of the best practices and whether the grammars are written accordingly.
Best practices
(This summary is copypaste from the document.)
- To make your work reusable, and to enable a division of labour:
- Divide the grammar into a base module (syntactic) and domain extension (lexical).
- To make it maximally simple to add languages:
- Consider defining the base part by a functor.
- To avoid low-level hacking and guarantee grammatical correctness:
- In the concrete syntax, use only function applications and string tokens, maybe records - but no tables, no concatenation.
- To guarantee that the grammar will continue to work in the future:
- Only use the API level of the resource grammar library.
- For scalability:
- Choose solutions that remain stable when new languages are added.
- A corollary:
- Never use lexical categories as linearization types.
- A scalability tools:
- Use type synonyms and constructors rather than raw types for linearization.
- To monitor your progress:
- Create a treebank for unit and regression testing, and use it often with the diagnostic tools.
The following tools are standard and well-tested in MOLTO’s and other applications:
- the GF compiler and shell
- the GF run-time for Haskell, Java, and C, as well as the web API
- the RGL for the 15 MOLTO languages
- the GF-Eclipse IDE * the use of smart paradigms for lexicon building
Phrasebook
It has two modules: Sentences, which contains phrases that can be defined by a functor over the resource grammar API. The phrases that are likely to have different implementations are in the module Words.
Semantic validity is handled with simple, restrictive abstract syntax. For example, an abstract syntax tree like
HowFarBy : Place -> ByTransport -> Question
guarantees that we can say "How far is the church by taxi" but not "How far is John by beer": the arguments need to be a place and a transport.
Module structure: Common constructions with a functor
Starting point for the grammar was a test corpus of sentences we want to express in the grammar. These sentences are used as a documentation for the abstract syntax:
AHasAge : Person -> Number -> Action ; -- I am seventy years AHasChildren: Person -> Number -> Action ; -- I have six children AHasName : Person -> Name -> Action ; -- my name is Bond
ACE-GF
ACE-GF: based on Attempto Controlled English. (ACE is ____.)
Acewiki working on ACE (acewiki subset), grammars for Cat, Dan, Dut, Eng (not ACE), Est, Fin, Fre, Ger, Ita, Lav, Nor, Pol, Ron, Rus, Spa, Swe, Urd (https://github.com/Attempto/ACE-in-GF/tree/master/grammars/acewiki_aceowl).
Grammar modules: ACE base, in addition domain lexicons (Geography).
(in AceWiki also normal grammars, not ace. But unrelated to ACE grammar.)
Museum
Query grammars
Grammar evaluation survey
Questionnaire Basic information: Use of development tools: Diagnostic tools Compilation diagnostics: Grammar display modes: Testing Tools for generation and testing: RGL Resource grammar tools: Grammar writing Starting point for your grammar: Basic unit of the grammar: Semantic control: Module structure: Concrete syntax:
Analysis of answers: ....
Some things answered in "Other", not in Best practices(?):
Other method for treebanks: Haskell code to store, edit and show differences in treebanks.
Other development tool: Haskell and shell scripts generating grammars
- Printer-friendly version
- Login to post comments
- Slides
What links here
No backlinks found.