2. Multilingual grammars
A GF program is a multilingual grammar, which for
n languages consist of 1+n modules: one abstract syntax defining the semantic content in a language-independent way, and for each language a concrete syntax showing how this content is expressed in that language. Here is a "hello world" example for English, Finnish, and Italian:
abstract Hello = { cat Greeting ; Recipient ; fun Hello : Recipient -> Greeting ; World, Mum, Friends : Recipient ; } concrete HelloEng of Hello = { lin Hello rec = "hello" ++ rec ; World = "world" ; Mum = "mum" ; Friends = "friends" ; } concrete HelloFin of Hello = { lin Hello rec = "terve" ++ rec ; World = "maailma" ; Mum = "äiti" ; Friends = "ystävät" ; } concrete HelloIta of Hello = { lin Hello rec = "ciao" ++ rec ; World = "mondo" ; Mum = "mamma" ; Friends = "amici" ; }
The GF compiler produces from this code a system that can parse phrases like hello world, ciao mamma and also generate them each language, thus enabling translation between any pair of languages.
The Hello grammar is of course extremely simple, on purpose. But it shows the essential
structure of multilingual grammars, and it is easy to see how the grammar could be
extended by adding new functions (i.e. combination rules like Hello
and words like Mum
).
The GF compiler controls that the abstract and concrete syntaxes are in synchrony.
For instance, it checks that each abstract syntax function (fun
) actually has
a linearization (lin
) in each concrete syntax. An IDE is expected to go one
step further: it reminds the programmer, prior to running the compiler, of those
linearizations that are missing. And when a new language (i.e. a new concrete syntax)
is added to the system, the IDE initializes its code with a template for all required
linearization rules.
Multilinguality is one aspect of GF's module system: each language, as well as the abstract syntax, has its own module. Larger GF applications have an additional complexity created by the inheritance and opening of modules; a large grammar can easily have 20 modules involved for each language, and this is multiplied by the number of languages plus one for the abstract syntax. While the opening and inheritance correspond to the module dependencies found in most other programming languages (such as inheritance and the use of libraries), the multilinguality aspect is an extra dimension, which makes GF programs more complex than usual programs.
A GF project with 15 languages, as targeted in the MOLTO project, involves hundreds of modules in scope at the same time. These are roughly divided to two groups,
- the application grammar: the code written by the programmer
- the resource grammar: the code imported from libraries
The total resource grammar code in September 2011 comprises 755 modules, addressing 20 natural languages. This code is normally distributed in binaries (although the source is also available) and never read or written by the application programmer. But the programme of course needs to inspect the code: to see, for instance, what functions are available to contruct objects of a given type such as noun or sentence. Inspecting the library code is one of the most important things that should be supported by the IDE.
- Printer-friendly version
- Login to post comments
- Slides
What links here
No backlinks found.