WP4: Knowledge engineering

Ontotext contributions to MOLTO through WP4 are

• Semantic infrastructure

• Ontology-grammar interoperability

WP4 requirements

Semantic infrastructure

The semantic infrastructure in MOLTO will also act as a central multi-paradigm index for (i) conceptual models—upper-level and domain ontologies; (ii) knowledge bases; (iii) content and metadata as needed by the use cases (mathematical problems, patents, museum artefact descriptions); and provide NL-based and semantic (structured) retrieval on top of all modalities of the data modelled.

In addition to the traditional triple model for describing individual facts,

<subject, predicate, object>

the semantic infrastructure, will build on quintuple-based facts,

<subject, predicate, object, named graph, triple set>

The infrastructure will include: inference engine (TRREE7), semantic database (OWLIM8), semantic data integration framework (ORDI9) and a Multi-paradigm semantic retrieval engine, all of which are previous work, resulting from private (Ontotext) and public funding (TAO10. TripCom11). This approach will enable MOLTO’s baseline and use case driven knowledge modelling with the necessary expressivity of metadata-about-metadata descriptions for provenance of the diverse sources of structured knowledge (upper-level, domain specific and derived (from grammars) ontologies; thesauri; domain knowledge bases; content and its metadata)

From Ontotext webpages, we can guess that the infrastructure builds on the following technologies:

• KIM is a platform for semantic annotation, search, and analysis

• OWLIM is the most scalable RDF database with OWL inference

• PROTON is a top ontology developed by Ontotext.

Milestone MS2 says the knowledge representation infrastructure is opened for retrieval access to partners at M6. The infrastructure deliverable D4.1 is due at M8.

Grammar-ontology interoperability

[7 Grammar-ontology interoperability for translation and retrieval in DoW]

At the time of the TALK project, an emerging topic was the derivation of dialogue system grammars from OWL ontologies. A prototype tool for extracting GF abstract syntax modules from OWL ontologies was thereby built by Peter Ljunglöf at UGOT. This tool was implemented as a plug-in to the Protégé system for building OWL ontologies3 and intended to help programmers with OWL background to build GF grammars. Even though this tool remained as a prototype within the TALK project, it can be seen as a proof of concept for the more mature tools to be built in the MOLTO project.

A direct way to map between ontologies and GF abstract grammars is a mapping between OWL and GF syntaxes.

In slightly simplified terms, the OWL-to-GF mapping translates OWL’s classes to GF’s categories and OWL’s properties to GF’s functions that return propositions. As a running example in this and the next section, we will use the class of integers and the two-place property of being divisible (“x is divisible by y”). The correspondences are as follows:

Class(pp:integer ...) <==> cat integer ;

ObjectProperty(pp:div <==> fun div :

domain(pp:integer) integer -> integer -> prop ;

range(pp:integer))

Less syntax-directed mappings may be more useful, depending on what information is relevant to pass between the two formalisms. The mapping is then also less generic, as it depends on the intended use and interpretation of the ontology. The mapping through SPARQL queries below is one example. A mapping over TF could be another one.

The GF-Protégé plug-in brings us to the development cost problem of translation systems. We have noticed that in the GF setting, building a multilingual translation system is equivalent to building a multilingual GF grammar, which in turn consists of two kinds of components:

• a language-independent abstract syntax, giving the semantic model via which translation is performed;

• for each language, a concrete syntax mapping abstract syntax trees to strings in that language.

In MOLTO, GF abstract syntax can also be derived from sources other than OWL (e.g. from OpenMath4 in the mathematical case study) or even written from scratch and then possibly translated into OWL ontologies, if the inference capabilities of OWL reasoning engines are desired. The CRM ontology (Conceptual Reference Model5) used in the museum case study is already available in OWL.

MOLTO’s ontology-grammar interoperability engine will thus help in the construction of the abstract syntax by automatically or semi-automatically deriving it from an existing ontology. The mechanical translation between GF trees and OWL representations then forms the basis of using GF for translation in the Semantic Web context, where huge data sets become available in RDF and OWL in initiatives like Open Linked Data (LOD).

The interoperability between GF and ontologies will also provide humans with natural ways of interaction with knowledge based systems in multiple languages, expressing their need for information in NL and receiving the matching knowledge expressed in NL as well:

Human -> NL -> GF -> ontology -> GF -> NL -> Human

providing an entirely new dimension to the usability of semantics-based retrieval systems, and opening extensive structured bodies of knowledge in human understandable ways.

Note also that the OWL to GF mapping also allows a wider human input to GF. OWL ontologies are written by humans (at present at least, by many more humans than GF grammars).

MOLTO website gives detail what is going to delivered first by way of ontology-GF interoperability. The first round uses GF grammar to translate NL questions to SPARQL query language (http://www.molto-project.eu/node/987).

The ontology-GF mapping here is a NL interface to PROTON ontologies, by way of parsing (fixed) NL to (fixed) GF trees and transforming the trees into SPARQL queries to run on the ontology DB.

Indirectly, this does define a mapping between (certain) GF trees and RDF models, using SPARQL in the middle. SPARQL is not RDF but a SPARQL query does retrieve a RDF model given a dataset, but the model depends on the dataset. With an OWL reasoner thrown in, we can get OWL query results.

What WP3 had in mind is a tool to translate between OWL models and GF grammars, i.e. convert OWL ontology content into GF abstract syntax. This tool is forthcoming next according to the MOLTO presentation slides (http://www.molto-project.eu/node/1008).

This was confirmed by email from Petar (https://kitwiki.csc.fi/twiki/bin/view/MOLTO/MoltoOntologyEvaluationPlanWP4).

The translation tools WP3 will consider using TermFactory multilingual ontology model and tools

as middleware between (non-linguistic) ontology and GF grammar. The idea is to (semi)automatically match or bridge third party ontologies to TF, a platform for collaborative development of ontology-based multilingual terminology. It then remains to define an automatic conversion between TF and GF.

The Varna meeting should adjudicate between WP3 and WP4 here.

A concrete subtask that arises here is to define an interface between the knowledge representation infrastructure (due Nov 2010) and TF (finished in ContentFactory project end of 2010).

WP4 evaluation

Since the aims are more related to use cases and framework development, than enhancing performance of existing technologies, the evaluation to be done during the project will be more of a qualitative than quantitative kind.

The evaluation of these features should reflect and demonstrate the multiple possibilities of GF that are gained through inter-operation with external ontologies. The evaluation of progress will exploit proof-of-concept demos and plans for further development. For further discussion, see https://kitwiki.csc.fi/twiki/bin/view/MOLTO/MoltoOntologyEvaluationPlanD91