3.3 Data Sources

This section describes the conceptual models, ontologies and knowledge bases, used in the MOLTO KRI as a context background in the RDFDB component.

  • PROTON ontology - a lightweight upper-level ontology defining about 300 classes and 100 properties in OWL Light.
  • The default KRI knowledge base - common world knowledge that contains information about people, organizations, locations, and job titles.

Most applications of the KRI require extending the conceptual models with domain ontologies and the underlying knowledge base with domain specific entities and facts.

3.3.1 PROTON Ontology

PROTON ontology provides coverage of the most general concepts, with focus on named entities (people, locations, organizations) and concrete domains (numbers, dates, etc.).

The design principles can be summarized as follows:

  • domain-independence;
  • light-weight logical definitions;
  • alignment with popular standards;
  • good coverage of named entities and concrete domains (i.e. people, organizations, locations, numbers, dates, addresses).

The ontology is originally encoded in a fragment of OWL Lite and split into four modules: System, Top, Upper, and KM (Knowledge Management), shown on Figure 5 below.

Figure 5 - PROTON Ontology

System module

The System module consists of a few meta-level primitives (5 classes and 5 properties). It introduces the notion of 'entity', which can have aliases. The primitives at this level are usually the few things that have to be hard-coded in ontology-based applications. Within this document and in general, the System module of PROTON is referred to via the "protons:" prefix.

Top module

The Top module is the highest, most general, conceptual level, consisting of about 20 classes. These ensure a good balance of utility, domain independence, and ease of understanding and usage. The top layer is usually the best level to establish alignment to other ontologies and schemata. Within this document and in general, the Top module of PROTON is referred to via the "protont:" prefix.

Upper module

The Upper module has over 200 general classes of entities, which often appear in multiple domains (e.g. various sorts of organizations, a comprehensive range of locations, etc.). Within this document and in general, the Upper module of PROTON is referred to via the "protonu:" prefix.

Knowledge Management (KM) module

The KM module has 38 classes of slightly specialized entities that are specific for typical Knowledge Management tasks and applications. Within this document and in general, the PROTON KM module is referred to via the "protonkm:" prefix.

3.3.2 The Default KRI Knowledge Base

The default KB contains numerous instances of PROTON Upper Module classes like: Public Company, Company, Bank, IndustrySector, HomePage, etc. It covers the most popular entities in the world such as:

  • Locations: mountains, cities, roads, etc.
  • Organizations, all important sorts of: business, international, political, government, sport, academic, etc.
  • Specific people

Content

  • collected from various sources, like geographical and business intelligence gazetteers
  • predefined - KRI uses entities only from trusted sources
  • can be enriched, replaced

Entity Description

The NE-s are represented with their Semantic Descriptions via:

  • Aliases (Florida & FL);
  • Relations with other entities (Person hasPosition Position);
  • Attributes (latitude & longitude of geographic entities);
  • the proper Class of the NE

The last build of the KRI KB contained 29104 named entities: 6006 persons, 8259 organizations, 12219 locations and 2620 job titles.