A generic text template for subdomains of a larger domain

Submitted by inari.listenmaa on 2 July, 2012 - 17:26

Motivation

The painting verbalization grammar, released in D8.2, is built for one ontology and the text building functions use words and expressions that are fixed for paintings. To verbalize an ontology of a different museum, for example a war museum, the grammar could be copypasted and relevant parts modified; however, a preferable solution would involve abstraction rather than repeating code. The motivation and the goal for this work is stated in D4.2:

"In analogy with the resource grammar API, we can envisage extending the museum case into a reusable library of verbalization/textualization patterns. In order to maintain simplicity, the GF templates can be made more abstract. Instead of hard-coding ontology specific description words (paint, display, size units), we generalize them as parameters chosen according to the domain and the ontology in question. UHEL has conducted some tests in this direction, generalizing the museum case patterns to more generic object description patterns."

Realistically speaking, the differences should be more subtle -- more about different kinds of art objects within a museum than different kinds of museums. For example, paintings and sculptures would have enough in common to use the same discourse patterns, but slightly different word choices, like painter and sculptor, but paintings and tanks are different enough that the benefit of finding common description patterns would probably be small.

Grammar design

The structure of the GF grammar is as follows. The code is found (with small differences, explained further) in TextTemplate.zip.

      instance        interface        instance
     LexArtEng - - - - LexEng  - - - - LexWarEng
        |                 |                |
        |             incomplete           |
      resource         resource         resource
     TextArtEng - - -  TextEng - - - - TextWarEng



                    abstract Museum = {
                      cat
                        <generic categories>
                      fun
                        <descriptions>
                    }                       


  abstract ArtMuseum =                abstract  WarMuseum = 
   Museum ** {                         Museum ** {                          
    fun                                  fun
      <art museum specific>                <war museum specific>
   }                                    }


                 
                incomplete concrete MuseumEng =
                   open TextEng in {
                     lincat
                       <generic categories linearized>
                     lin
                       <using textgen opers from TextEng>
                 }

                             
concrete ArtMuseum = MuseumEng           concrete WarMuseum = MuseumEng 
 with (TextEng=TextArtEng) ** {           with (TextEng=TextWarEng) ** {      
   lin                                      lin 
      <art museum specific>                    <war museum specific>
}                                        }

Ignoring the first six modules, the grammar is a Domain that is extended by SubDomain = Domain ** {...}. Adding concepts to the grammar should happen via SubDomainL with user-friendly functions, in the style mkConcept "str". The morphological functions are hidden in DomainL as following,

  lin
    Item = NP ;
    Author = NP ;
  oper
    mkItem : Str -> Item = \n -> mkNP (mkN n) ;
    mkAuthor : Str -> Author = \a -> mkNP (mkPN a) ;

and used in SubDomainL:

  lin
    Mona_Lisa = mkItem "Mona Lisa" ;
    PortraitPainting = mkItemType "portrait" ;
    Rembrandt = mkAuthor "Rembrandt" ;
    Ateneum = mkMuseum "Ateneum" ;
    Wood = mkMaterial "wood" ;

This, in itself, is a recommended design principle (see e.g. D2.3 5.2.2): a base grammar and domain extensions. The content of SubDomain should be mostly lexical; the idea is that the textualization patterns are same for all subdomains (except some lexical choices) and they can be all linearized in the common part.

The lexical variance in the textualization patterns is what the first six modules are for. The abstract text descriptions in Domain are linearized using , which is an incomplete resource module with parameterized text generation functions. For example, the following function in TextEng describes the author of an item:

incomplete resource TextEng = open SyntaxEng, LexEng in {
oper
    AuthorText : NP -> NP -> Text = \item,author ->  
      mkText (mkS pastTense 
        (mkCl item (mkVP (passiveVP make_V2) 
          (mkAdv by8agent_Prep author)))) ;

The verb make_V2 is from the interface LexEng, and it might have different values in LexArtEng and LexWarEng, for example paint and manufacture respectively.

The result is the following; the function Authorship is linearized differently in WarMuseum and ArtMuseum, even though in the concrete syntax it is explicitly written just as Authorship = AuthorText ;.

WarMuseum> gt -tr | l
IType Pasi BattleTank
Authorship Pasi FinnishArmy

Pasi is a tank .
Pasi was manufactured by Puolustusvoimat .

ArtMuseum> gt -tr | l
IType Mona_Lisa PortraitPainting
Authorship Mona_Lisa Rembrandt

Mona Lisa is a portrait .
Mona Lisa was painted by Rembrandt .

GF questions

For (unknown) technical reasons, making a concrete syntax by extending an incomplete concrete doesn't work. So instead of the design on the left side, the code in the attached file is done as on the right side.

concrete ArtMuseum = MuseumEng           concrete ArtMuseum = MuseumEng ** 
 with (TextEng=TextArtEng) ** {           open TextArtEng in {      
   lin                                      lin 
      <art museum specific>                    <war museum specific>
                                               Authorship = AuthorText ;
}                                        }

This means that, instead of writing Authorship = AuthorText ; only in MuseumEng, the line is repeated in each MuseumSubdomainEng. That is not a big problem with regard to abstraction; the function AuthorText is still defined in only one place.

The second decision to make is the types of the arguments in the text patterns. The functions in TextEng could operate on GF resource grammar types, such as CN, NP and Adv. This has the downside that the functions will look messy, and it is easy to make mistakes if trying to modify them. For instance, a slightly longer description looks like this:

    DescriptionText : NP -> CN -> NP ->  NP -> Adv -> NP -> Text
     =\item,itype,author,museum,year,material -> ...

On the one hand, using RGL types makes the functions usable for any grammar -- although this is not very realistic concern. Other idea is to connect TextEng to the categories defined in MuseumEng, as in following:

incomplete resource TextEng = MuseumEng [Item,Author,ItemType,GenText] ** 
open SyntaxEng, LexTemplate in {
  oper
    AuthorText : Item -> Author -> GenText = \item,author -> ...

The body of the function would still consist of mkTexts and mkNPs, so that Item, Author and GenText are nothing more than type synonyms, for easier readability. If the lincats for those types are changed in MuseumEng, the functions in TextEng need to be changed too.

Ontology compatibility

The grammar in D8.2 uses a database, where the existing paintings are defined as types, and the textualization function accepts only a combination of parameters for which there is a type.

data
  MkVerifiedText : 
    (pg : Painting) -> (pr : Painter) -> (pt : PaintingType) -> 
    (cr : OptColour) -> (se : OptSize) -> (ml : OptMaterial) -> 
    (yr : OptYear) -> (mm : OptMuseum) -> 
    CompletePainting pg pt pr yr mm cr se ml ->
      VerifiedText ;

  GSM940042ObjPainting : CompletePainting 
    GSM940042Obj MiniaturePortrait JKFViertel (MkYear (YInt 1814)) 
    (MkMuseum GoteborgsCityMuseum) (MkColour Grey) (MkSize (SIntInt 349 776)) (MkMaterial Wood) ;

There should be no problem in using the generic text template with this approach to ensure only valid combinations of data. (Haven't tested it yet though.)

Summary

What it is?: A parameterized text template for a domain and its subdomains.
What's it good for?: To avoid copypaste when making textual representation patterns for things that are almost the same. For example, an art museum has an ontology that contains paintings, sculptures and wood carvings. We can make MuseumL to contain lincats and mkConcept "str" type of end-user constructors, and PaintingL and WoodCarvingL to contain all paintings and wood carvings in the collection. Parameterized textualization patterns for different items can be defined in TextL, and right word choices come from TextPaintingL and TextCarvingL.

Attachment	Size
TextTemplate.zip	3.68 KB

What links here

No backlinks found.

Demos

Recent News

Recent Publications