The Painting Grammar
The work-package dealing with the domain of cultural heritage has focused on the description of museum artefacts, in particular paintings. While the description of the subject matter of a painting is an open domain, the other characteristics of a painting can be described by a constrained natural language tightly coupled with the underlying knowledge representation used by museum curators. The design of this grammar has been based on sample descriptions of paintings retrieved from the Gothenburg City Museum and has been further applied to generate descriptions of artefacts stored on public web pages, such as DBPedia.
One major discussion has concerned the identification of entity names, museum names, as well as famous painters' or masterpieces are often translated ad hoc. For such cases, it is hard to create grammar-based translation rules, consider for instance Mona Lisa, in Italian often referred to as La Gioconda. The approach taken in this work package has been that of not translating the entity names found in the knowledge base while investigating whether historically there could be a given title or name that could be taken as a universally valid identifier for that entity. Since to our knowledge, there seems to be no agreement by museum curators on unique resource identifiers (whereas for instance, in the publishing world, there have been efforts of uniquely indexing published material), we have adopted a naming based on the resource descriptors we retrieved in our samples. In terms of future web application building, we are aware that resource identification and/or retrieval by the common name is not as sound as by unique ID.
This grammar is also modularly designed and assembled categories that are used to represent location, material, color, dimension, type of work, and painter's biographical data. The most relevant feature of this grammar is the construction of a description as a sequence of phrases related to the same artefact, using referential chains to build up a coherent discourse. Please see the list of publications tagged with WP8 for further information about the comparative study of texts in the cultural heritage domain and about the background knowledge base underlying the ontology from which texts in 15 languages are generated.
The grammar files are avliable on svn: molto-project.eu/wp8/d8.3/grammars/
The demo webpage is avaliable at: http://museum.ontotext.com/
Grammar characteristics
The version of the grammar on display at the MOLTO Application Grammar web service (TextPainting.pgf) features:
The following start categories: Main category: Description 9 semantic categories which represent the ontology classes: Colour, Material, Museum, Painter, Painting, PaintingType, Size, Title, and Year. Of these 8 categories, 5 are optional, hence the additional 'Opt' categories. 3 category types: String, Int, Float 1 grammatical category for creating nested colour strings: ListColour
Support for 15 languages: Bulgarian (Bul), Catalan (Cat), Danish (Dan), Dutch (Dut), English (Eng), Finnish (Fin), French (Fre), Hebrew (Heb), Italian (Ita), German (Ger), Norwegian (Nor), Romanian (Rom), Russian (Rus), Spanish (Spa), Swedish (Swe).
Up to three sentence long text generation where each sentence may be constructed with different semantic categories. For example, consider the first sentence of a description:
Forest[PAINTING] was painted by Paul Cezanne[PAINTER] in 1902[YEAR].
Forest[PAINTING] was painted on canvas[MATERIAL] by Paul Cezanne[PAINTER] in 1902[YEAR].
Change of the syntactic element of the reference entity in sentence initial, i.e.
Forest was painted by Paul Cezanne in 1902. It[Pronoun] is painted in green and blue.
Forest was painted by Paul Cezanne in 1902. This painting[NounPhrase] is displayed at the National Gallery of Canada.
Restrictions of the grammar
As mentioned above, the names of the paintings and painters have been left untranslated. Since museum names have been translated automatically, some translations are missing. Therefore two or three words names contain underscores.
Hebrew texts with names that are missing translations cause wrong ordering of the words in a sentence.
- Printer-friendly version
- Login to post comments
- Slides
What links here
No backlinks found.