X.1 Grammars

inari.listenmaa

Multilingual Online Translation

Best practices

(This summary is copypaste from the document.)

To make your work reusable, and to enable a division of labour:
 Divide the grammar into a base module (syntactic) and domain extension (lexical).
To make it maximally simple to add languages:
 Consider defining the base part by a functor.
To avoid low-level hacking and guarantee grammatical correctness:
 In the concrete syntax, use only function applications and string tokens, maybe records - but no tables, no concatenation.
To guarantee that the grammar will continue to work in the future:
 Only use the API level of the resource grammar library.
For scalability:
 Choose solutions that remain stable when new languages are added.
A corollary:
 Never use lexical categories as linearization types.
A scalability tools:
 Use type synonyms and constructors rather than raw types for linearization.
To monitor your progress:
 Create a treebank for unit and regression testing, and use it often with the diagnostic tools.

The following tools are standard and well-tested in MOLTO’s and other applications:

  • the GF compiler and shell
  • the GF run-time for Haskell, Java, and C, as well as the web API
  • the RGL for the 15 MOLTO languages
  • the GF-Eclipse IDE * the use of smart paradigms for lexicon building

Phrasebook

It has two modules: Sentences, which contains phrases that can be defined by a functor over the resource grammar API. The phrases that are likely to have different implementations are in the module Words.

Semantic validity is handled with simple, restrictive abstract syntax. For example, an abstract syntax tree like

HowFarBy : Place -> ByTransport -> Question

guarantees that we can say "How far is the church by taxi" but not "How far is John by beer": the arguments need to be a place and a transport.

Module structure: Common constructions with a functor

Starting point for the grammar was a test corpus of sentences we want to express in the grammar. These sentences are used as a documentation for the abstract syntax:

AHasAge     : Person -> Number -> Action ;    -- I am seventy years
AHasChildren: Person -> Number -> Action ;    -- I have six children
AHasName    : Person -> Name   -> Action ;    -- my name is Bond

ACE-GF

ACE-GF: based on Attempto Controlled English. (ACE is ____.)

Acewiki working on ACE (acewiki subset), grammars for Cat, Dan, Dut, Eng (not ACE), Est, Fin, Fre, Ger, Ita, Lav, Nor, Pol, Ron, Rus, Spa, Swe, Urd (https://github.com/Attempto/ACE-in-GF/tree/master/grammars/acewiki_aceowl).

Grammar modules: ACE base, in addition domain lexicons (Geography).

(in AceWiki also normal grammars, not ace. But unrelated to ACE grammar.)

Museum

Query grammars


Grammar evaluation survey

Questionnaire

Basic information: 

Use of development tools:

Diagnostic tools
Compilation diagnostics: 
Grammar display modes: 

Testing
Tools for generation and testing: 

RGL
Resource grammar tools:

Grammar writing
Starting point for your grammar:

Basic unit of the grammar:

Semantic control:

Module structure: 

Concrete syntax:

Analysis of answers: ....

Some things answered in "Other", not in Best practices(?):

Other method for treebanks: Haskell code to store, edit and show differences in treebanks.

Other development tool: Haskell and shell scripts generating grammars