gfsage internal workings

olga.caprotti

Multilingual Online Translation

The GF side

A GF module acts as a post office translating messages between the different parties (nodes) composing a dialog. This section is more a description of a proposed design strategy for a generic postoffice interface based on GF. The actual code implements ideas of this design, but, for instance, it contains no edges or nodes as explicit entities.

Nodes and edges

gfsage deals with just 2 agents:

The user
The Sage system

in the case whether the input language is different of the output language, we may consider a third node (the output user).

There is a unique pgf module containing all GF information for the dialog system to work: Commands.pgf. Each node has a language (a GF concrete module) assigned: the user uses a natural language (i.e., ComandsEng for English).

A node reacts to received messages by sending a reply. The chain of messages between two nodes is called a dialog. An active node as the user can start a dialog by sending a message. A passive node, like the Sage system here, just replies to the received messages.

A node can receive:

A regular message from another node: This is a GF linearization in the receptor language.
A no_parse message from the postoffice telling that a previous outgoing message cannot be parsed.
An is_ambiguous message from the postoffice related to a previous message sent by the node, specifying that it was ambiguous and carrying additional info for the node to decide among the possible meanings. To respond to this, the node must send a disambiguate message to the postoffice (see below).

A node can send:

A regular message to another node: This is a parseable string for the emitter language.
A disambiguate message sent in response to an ambiguous message. In this message the node chooses one of the options or aborts the transaction.

A regular message between two given nodes corresponds to a fixed GF category. In the case of gfsage it is Command for messages traveling from User to Sage and Answer for messages going the other way.

Up and Down pipeline

A regular message from node N1 to node N2 goes through the following steps:

Input string is lexed, that is: separated into parse-able units (tokens);
It is then parsed using the node N1 language and edge category (i.e. node N1 to node N2) into a set of GF abstract trees;
This set is, hopefully, reduced by paraphrasing the trees and removing duplicates (it is the compute step);
Now, If the resulting set is empty, a no_parse message is sent back to the sending node. If it contains more than one entry, an is_ambiguous message is sent. In the previous cases, the process stops here; Only when the computed set contains just an entry, is this pushed downstream to the node N2.
The abstract tree is linearized using the node N2 language;
The result is unlexed, that is: assembled into a string that is delivered to the receiving node.

The Sage side

For Sage to work alongside GF, we need a http sever listening to Sage commands and some scripts to set up the environment and respond to the type of queries that can be expressed in the Mathematics Grammar Library, MGL.

The Sage server

A Sage process is started in the background by the start-nb.py script in -python mode. This script starts a Sage notebook, as described in Simple server API, listening on port 9000 and up to requests in http format. It also installs a handler for cleanly disposing of the notebook object whenever the parent process terminates.

The parent process sends then an initial request to load some functions and variables that we'll need in the dialog system defined in prelude.sage and goes into the main evaluation loop.

Sage scripts

realsets.py: is a Sage module developed to support set operations as described in the Set1 module of the MGL. (See the page about it)
prelude.sage: defines Sage functions to implement derivation on the style of the MGL and state storing for numbers, sets, functions and sets to support anaphora in the dialog.