Cultural heritage WG
Participants: Ramona, Inari, Milen, LauriC
The task is to verbalize an ontology. The ontology in this case is the Gothenburg City Museum ontology, which contains information about the museum objects in that particular museum. We don't need to prepare for unrestricted vocabulary or user input; at least not in text form.
GF grammar details
The GF grammar is two-part. We have a direct verbalization, that is, just the ontology triples translated into GF syntax. As one triple contains only one piece of information, the resulting sentences from this GF grammar are of type Mona Lisa is a Painting. Mona Lisa is painted by Leonardo da Vinci. Mona Lisa is located in Louvre. This grammar is still useful as a lexicon and facts about the items in the lexicon. Creating a concrete syntax needs necessarily some human work: the ontology does not contain translations for the terms in all languages we'd want, and even for languages it covers, it doesn't have all linguistic information needed in a GF grammar. (WP2 and WP3 tools could be used to help the grammar writer; more on that later.)
In addition to the direct verbalization, we have also a discourse-building grammar. It consists of GenText.gf
and GenLexicon.gf
. The GenLexicon is a more refined version of the direct verbalization grammar: it has some irrelevant parts removed (such as numerical codes, which have no information value for a human) and some of the lexical information already aggregated. For example, the ontology triples for a painting might look like this:
xx:Guernica yy:paintedBy zz:Picasso xx:Guernica cc:hasColour dd:Black xx:Guernica cc:hasColour dd:White xx:Guernica cc:hasColour dd:Gray
We can see that there are multiple instances of the predicate hasColour
, and we can aggregate that information into only one colour in the GF grammar, called BlackWhiteGray
.
(A more generic approach would be like this:
White : Colour ; Black : Colour ; Red : Colour ; Multicolour : Colour -> Colour -> Colour ; Multicolour Red (Multicolour Black White) Multicolour Black (Multicolour Red (Multicolour White Black))
We could generate every combination of colours with that, but as the task is to verbalize only the paintings in Gothenburg Museum Ontology, we can get every existing combination and have them in the grammar spelled out. This means more items in the GF grammar, but it makes the language more fluent: many languages have specific words for some common combinations of colours, but with the Multicolour
way of combining colours we'd have to choose a generic rule with which to combine colours, such as black and white and grey and blue and red.)
Finally, GenText.gf
is the grammar where we do the information aggregation on a semantic level. We build multiple discourse patterns; since there are many ways to give information about a painting. This is in practice implemented with huge bunch of dependent types: a sentence that tells a painting's author, colour and location is of type Painting Author Location
, and the items are all checked to be compatible with the ontology; we can't build a sentence "Guernica was painted by Pablo Picasso and it is located in Helsinki" if the details don't match with the direct verbalization grammar. We can create proof objects automatically for every original triple. There are also proof objects with 3 or more arguments (and they are created by ???).
Anyway, the functions in GenText look somewhat like this:
BuildSentence -> (p: Painting) -> (a: Author) -> (l: Location) -> AuthorProofObject p a -> LocationProofObject p l -> FinalProofObject p a l -> Text ;
Tools to help the grammar writer
We want translations of the terms, not new ones. We can use WP3 lexicon management tools to search FactForge's art terms. We can also use the example-based grammar writing tool.
Future work
Allow users to query the ontology and output exactly the answers they want. But that's not for the next deliverable.
- Printer-friendly version
- Login to post comments
- Slides
What links here
No backlinks found.