2. MOLTO Application Grammars

At the beginning of the project, we have published the MOLTO Phrasebook as example application grammar. For the final version of our online service, we show all the relevant GF application grammars that have been developed in various work-packages as supporting grammars for larger applications. Each example in this collection can be used by a new GF grammar developer as a starting point that can be further extended. In this deliverable we briefly document the grammars, the online applications that use them, and give quick hints on where extension can occur in future work.

The Geography Grammar

This grammar has been developed originally for the semantic multilingual wiki system AceWiki-GF, as documented in Deliverable D11.3. The grammar can be used online at http://attempto.ifi.uzh.ch/acewiki-gf/.

It currently supports 3 languages: ACE, German and Spanish, where ACE is a formal language used for automated reasoning. A 500-word geography domain vocabulary has been created to describe Europe.

ACE is represented by two languages, Ace and Ape. Ape linearizations contain explicit lexical entries so that the ACE parser (APE) can be used to map the sentences of this grammar to OWL. The wiki shows how this mapping works.

A snapshot of the grammar is available at http://www.molto-project.eu/biblio/software/geographypgf.

The MOLTO Phrasebook

The MOLTO Phrasebook has been the first demonstrator of the features of the Grammatical Framework technology, online since M3 of the project's lifetime. The application grammar was designed to serve as model for best practices. It shows a modular approach to the definition of abstract types and functions from the domain of travelers' phrasebooks , covering natural language for giving directions, ordering a meal, and greeting friends. It has categories for Citizenship, Country, Currency, Date and week Day, Digits, DrinkKind and MassKind, Languages, Greetings and many more. Eng, Bul, Cat, Dan, Dut, Fin, Fre, Ger, Hin, Ita, Lav, Nor, Pes, Pol, Ron, Rus, Spa, Swe, Tha, Urd. It has a module that handles disambiguation in Eng and in Ron.

The final version is online at http://www.molto-project.eu/cloud/gf-application-grammars by selecting as application Phrasebook.pgf.

The repository for the grammar file itself is at http://www.molto-project.eu/biblio/software/phrasebookpgf.

The Mathematical Grammars

MathBar.pgf

MathBar.pgf is the application grammar developed for the mathematical natural language domain. It supports the following languages: Fre, Cat, Spa, Eng and Fin. More languages are available but have not been checked against quality. The Mathematical Grammar Library (MGL) is a specialized language in which textual fragments are interspersed with formal fragments represented in the typesetting language LaTeX.

The source files are distributed via svn at URL: svn://molto-project.eu/mgl Repository Root: svn://molto-project.eu Repository UUID: 54d65b75-f25a-4862-968f-dc0a3298bc6b Revision: 2432

The compiled PGF grammar is available from http://www.molto-project.eu/biblio/software/mathbarpgf.

Commands.pgf

Commands.pgf is the application grammar developed for natural language I/O to the Sage computer algebra system. It translates input queries and output answers into natural language of mathematical nature. Users can ask for computations related to arithmetic, domain and range of functions, differentiation and integration. It also supports the usage of referential mechanism by the pronoun it, which will link to the previous result in a session of sequential computations. English, German and Spanish are currently supported.

Dialog.pgf

Dialog.pgf translates natural language interactions of the word problems prototype documented in Deliverable D6.3. It is used to give hints in the student's language and to formalize the students' answers or commands as Prolog statements that can be reasoned automatically with. It is an example of how a description of a specific world situation (owing fruits, animal in a farm) can be interpreted and formalized. Catalan, English, Spanish and Swedish are currently supported. The programming language Prolog is also supported. SVN info for compilation from source: URL: svn://molto-project.eu/mgl/wproblems Repository Root: svn://molto-project.eu Repository UUID: 54d65b75-f25a-4862-968f-dc0a3298bc6b Revision: 2432 GF version compilation: Grammatical Framework (GF) version 3.4-darcs.

The version archived and deployed on the MOLTO cloud is http://www.molto-project.eu/biblio/software/dialogpgf.

The Painting Grammar

The work-package dealing with the domain of cultural heritage has focused on the description of museum artefacts, in particular paintings. While the description of the subject matter of a painting is an open domain, the other characteristics of a painting can be described by a constrained natural language tightly coupled with the underlying knowledge representation used by museum curators. The design of this grammar has been based on sample descriptions of paintings retrieved from the Gothenburg City Museum and has been further applied to generate descriptions of artefacts stored on public web pages, such as DBPedia.

One major discussion has concerned the identification of entity names, museum names, as well as famous painters' or masterpieces are often translated ad hoc. For such cases, it is hard to create grammar-based translation rules, consider for instance Mona Lisa, in Italian often referred to as La Gioconda. The approach taken in this work package has been that of not translating the entity names found in the knowledge base while investigating whether historically there could be a given title or name that could be taken as a universally valid identifier for that entity. Since to our knowledge, there seems to be no agreement by museum curators on unique resource identifiers (whereas for instance, in the publishing world, there have been efforts of uniquely indexing published material), we have adopted a naming based on the resource descriptors we retrieved in our samples. In terms of future web application building, we are aware that resource identification and/or retrieval by the common name is not as sound as by unique ID.

This grammar is also modularly designed and assembled categories that are used to represent location, material, color, dimension, type of work, and painter's biographical data. The most relevant feature of this grammar is the construction of a description as a sequence of phrases related to the same artefact, using referential chains to build up a coherent discourse. Please see the list of publications tagged with WP8 for further information about the comparative study of texts in the cultural heritage domain and about the background knowledge base underlying the ontology from which texts in 15 languages are generated.

The grammar files are avliable on svn: molto-project.eu/wp8/d8.3/grammars/

The demo webpage is avaliable at: http://museum.ontotext.com/

Grammar characteristics

The version of the grammar on display at the MOLTO Application Grammar web service (TextPainting.pgf) features:

  • The following start categories: Main category: Description 9 semantic categories which represent the ontology classes: Colour, Material, Museum, Painter, Painting, PaintingType, Size, Title, and Year. Of these 8 categories, 5 are optional, hence the additional 'Opt' categories. 3 category types: String, Int, Float 1 grammatical category for creating nested colour strings: ListColour

  • Support for 15 languages: Bulgarian (Bul), Catalan (Cat), Danish (Dan), Dutch (Dut), English (Eng), Finnish (Fin), French (Fre), Hebrew (Heb), Italian (Ita), German (Ger), Norwegian (Nor), Romanian (Rom), Russian (Rus), Spanish (Spa), Swedish (Swe).

  • Up to three sentence long text generation where each sentence may be constructed with different semantic categories. For example, consider the first sentence of a description:

    Forest[PAINTING] was painted by Paul Cezanne[PAINTER] in 1902[YEAR].

    Forest[PAINTING] was painted on canvas[MATERIAL] by Paul Cezanne[PAINTER] in 1902[YEAR].

  • Change of the syntactic element of the reference entity in sentence initial, i.e.

    Forest was painted by Paul Cezanne in 1902. It[Pronoun] is painted in green and blue.

    Forest was painted by Paul Cezanne in 1902. This painting[NounPhrase] is displayed at the National Gallery of Canada.

Restrictions of the grammar

As mentioned above, the names of the paintings and painters have been left untranslated. Since museum names have been translated automatically, some translations are missing. Therefore two or three words names contain underscores.

Hebrew texts with names that are missing translations cause wrong ordering of the words in a sentence.

The Patent Query Grammar

This grammar is used to translate user queries into SPARQL. It contains 4 languages: English, German, French and a concrete syntax corresponding to SPARQL. Since the grammar is adapted to the patents domain, the constructors from the abstract syntax describe individual queries that depend on the domain. So, the SPARQL mappings are written in a gap-filling fashion, by specifying the query with spaces for the arguments.

Mode details from deliverables released by Work-package 7.

The sources are in the svn://molto-project.eu/wp7/query/grammars.

The Words300 Grammar

The Words300-grammar was produced to evaluate the correctness of the multilingual translation of ACE sentences offered by the ACE-in-GF grammar. The grammar contains ~300 words from the GF resource grammar library (RGL), namely the words from the ACE word classes common noun, transitive verb and proper name. Currently, most of the RGL languages are included, altogether 21 languages. For the description of the evaluation, see D11.3.

Note that the English sentences that this grammar produces are not always valid ACE sentences, due to "spaces in content words" which is not allowed in ACE. For example, the grammar supports For which computer does John wait? while ACE requires Which computer does John wait-for?.

The grammar can be used in a wiki at: http://attempto.ifi.uzh.ch/acewiki-gf/gf/Words300/main/

A snapshot of the grammar is available at http://www.molto-project.eu/biblio/software/words300pgf.