WP7 - Patents

WP7: Patents

The first year review recommended that WP7 work should focus on the major issues examined in MOLTO, especially in relation to the grammar-ontology interoperability rather than chemical compound splitting. Specific scenarios are needed for the exploitation of MOLTO tools in this case study. It was recommended to include such scenarios in a new version of deliverable D9.1.

In response, two use case scenarios were described: UC-71 and UC-72.

UC-71 focuses on grammar-ontology interoperability. User queries, written in CNL (controlled natural language) are used to query the information retrieval system.
UC-72 focuses on high-quality machine translation of patent documents. It uses an SMT baseline system to translate a big dataset and fill up the retrieval databases. In order to study the impact of hybrid systems in translation quality, a smaller dataset will be translated using the hybrid system developed in WP5.

Evaluation related to WP7

WP7 corresponds to the Patents Case Study. Its objective is to build a multilingual patents retrieval prototype. The prototype consists of three main modules: the multilingual retrieval system, the patents translation and the user interface. This document proposes a methodology to evaluate these modules within the MOLTO framework.

Translation system

The automatic translations included in the retrieval database have been produced by the machine translation systems developed within the WP5. Hence, the evaluation related to this module is the same as the one described for the WP5 systems.

Retrieval system

Nowadays, the IR-facility organizes the TREC Chemical IR Evaluation campaign (http://www.ir-facility.org/trec-chem-2011-cfp) The evaluation campaign has three different tracks. One of them is very related to our objective in this WP. - Technology Survey - Given an information need (from the bio-chemistry domain) expressed in natural language, retrieve all patents and scientific articles which may satisfy this need.

Following the guidelines described in the TREC campaign, the methodology proposed to evaluate the patents retrieval system is as follows.

Select a set of topics (between 5-10) and create a natural language queries for each topic (preferably, they must be manually created by experts). Each query must express an information needed based on the data described in a patent. The priority is to be as similar as possible to a genuine information need of an expert searcher.
The system will have to return a set of documents that answer this information need as best as possible. For any of the runs, it may return a maximum of 100 relevant documents (our database will contain ~8000 documents), preferably using the standard trec_eval format: Topic_number query_number document_id rank score run_name.
Manually annotate the retrieved documents as match/mismatch.
Calculate the AP (Average Precision, [1]) and NDCG (Normalized Discounted Cumulative Gain [2]), which are common metrics for these kind of systems [4].

The user interface

User interfaces are usually evaluated by means of their Usability. According to the ISO 9241-11, usability must measure the "Extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use.".

Hence, to get a complete picture of the usability, we need to measure the user satisfaction (users reaction to the interface), effectiveness (can people complete their tasks?) and efficiency (how long do people take?).

The three measures of usability are effectiveness, efficiency and satisfaction. They are independent and it must be measured all three to get a rounded measure of usability.

Effectiveness. This can be automatically by logging the user interactions with the system, and manually analysing the system responses. The measure can be also contrasted with a specific question in the satisfaction questionnaire.
Efficiency. This measure can be automatically obtained by logging the user interaction. To do so, the experiment requires to implement the needed mechanisms to a) determine the start and end of the experiment (for each scenario and/or for the complete experiment), b) relate the previous record with a specific user and the other two measures (effectiveness and satisfaction). We could also request the users to time themselves, but this measurements will be less reliable.
Satisfaction. This measure can be obtained through requesting the users to answer a questionnaire. Commonly used questionnaires for this tasks are the IBM CSUQ [5] or the SUS Questionnaire [6]. Another novel method is the cloud of words, in which users have to select a subset of words describing the system among a predefined set of adjectives. An general description of this method can be found in [7].

The experiment setting may consist of two scenarios: a closed one (i.e., specifying the information that must be obtained) and an open one (i.e., let the user search any type of information). The users are requested to complete both scenarios, and the order in which they are done must be balanced (i.e., Half of them will do the open scenario first). They must answer the questionnaire twice, just after each scenario.

The potential users might be of two types: MOLTO participants and related people (internal) and external users. The internal users can be used as the control test. External participants can be engaged from tools like the Mechanical Turk Requester [8].