# Living Deliverables

Living deliverables are the online drafts of the project's deliverable documents. Please use the cover page as in the template deliverable and enter the administrative data for the deliverable. Once finished editing, and ready for the final version, produce the PDF from the top cover page, using the print icon, and create a Biblio item to archive the version of the document to be delivered. Both the Biblio item and the link to the corresponding living deliverable should be entered in the administrative view for the project's deliverables.

Having done the above, the table of deliverables is automatically filled in by the system.

# D2.1 GF Grammar Compiler API

Contract No.: FP7-ICT-247914 MOLTO - Multilingual Online Translation D2.1. GF Grammar Compiler API Public M13 March 2010 Prototype Draft (evolving document) A. Ranta, T. Hallgren, et al. UGOT

Abstract

The present paper is the cover of deliverable D2.1 as of M13.

# D2.2 Grammar IDE

Contract No.: FP7-ICT-247914 MOLTO - Multilingual Online Translation D2.2 Grammar IDE Public M18 September 2011 Prototype Final A. Ranta, T. Hallgren, et al. UGOT John Camilleri, Ramona Enache

Abstract

Deliverable D2.2 describes the functionalities for an Integrated Development Environment (IDE) for GF (Grammatical Framework). The main question it addresses is how should such a system help the programmers who write multilingual grammars? Two IDE's are presented: a web-based IDE enabling a quick start for GF programming in the cloud, and an Eclips plug-in, targeted for expert users working with large projects, which may involve the integration of GF with other components. Example-based grammar writing is also described in the end.

# D4.1 Knowledge Representation Infrastructure

Contract No.: FP7-ICT-247914 MOLTO - Multilingual Online Translation D4.1 Knowledge Representation Infrastructure Public 1 Nov 2010 1 Nov 2010 Regular Publication Final Petar Mitankin, Atanas Ilchev ONTO ( WP4 ) Borislav Popov, Reneta Popova, and Gergana Petkova

### Abstract

This document presents the specification of the Knowledge Representation Infrastructure (KRI), which is based on pre-existing products. The KRI ensures a mature basis for storage and retrieval of structured knowledge and content. The document provides a description of the technology building blocks, overall architecture, standards used, query languages and inference rules.

AttachmentSize
D4.1_reviewed.pdf1.07 MB

# D6.1 Simple drill grammar library

Contract No.: FP7-ICT-247914 MOLTO - Multilingual Online Translation D6.1. Simple Drill Grammar Library Public M18 September 2011 Prototype Final (evolving document) J. Saludes, et al. UPC

Abstract

The present paper is the cover of deliverable D6.1 as of WP6. It gives installation instructions for the Mathematical Grammar Library and a short manual.

# 1. How to get it

The living end of the library is publicly available using subversion as:

     svn co svn://molto-project.eu/mgl


A stable version can be found at:

    svn co svn://molto-project.eu/tags/D6.1


# 2. Library structure

The mgl library consists on the following files and directories:

• One directory per language
• abstract directory: For the abstract modules of the library
• resources directory: Containing the general resource modules, incomplete concrete modules and generic lexicon.
• server: Code for the mathbar demo.
• test: Testing facilities and data

# 2.1 Logical structure

At the same time, the library can be organized in three layers of increasing complexity:

• Ground layer: it contains basic and atomic elements. Modules Ground and Variables.
• OpenMath layer: the bulk of the library resides here: There is a module for each targeted OpenMath Content Dictionary, namely: Arith1, Arith2, Calculus1, Complex1, PlanGeo1, Fns1, Integer1, Integer2, Interval1, Limit1, LinAlg1, LinAlg2, Logic1, MinMax1, Nums1, Relation1, Rounding1, Set1, SetName1, Transc1, VecCalc1 and Quant1
• Operations layer: the top layer is for expressing simple mathematical drills by combining an imperative (Compute, Prove, Find, etc.) with the productions of the OpenMath layer. There is also possible to express a sequence of simple computations and to set pre-conditions.

# 3. Compiling the library

Inside the mgl directory:

    make


will compile the top (Operations) layer and produce Test.pgf. To compile only the OpenMath layer:

    make om


# 4. Demo

An online version of the mathbar demo is http://www.grammaticalframework.org/demos/minibar/mathbar.html.

# 5. Testing

The library compiles for the following EU languages: Bulgarian, Catalan, English, Finnish, French, German, Italian, Polish, Romanian, Spanish, Swedish.

Regression testing of the OpenMath productions is possible through a treebank containing about 140 productions from this layer. At the present moment it contains linearizations for English, German, Polish and Spanish. At the time of writing this report, the entries of these languages (except for Polish) had been corrected by fluent speakers of the respective language. To allow for discrepancy, earlier corrections are also stored in the treebank, tagged with author and revision number.

The structure of the treebank is described in the evaluating document.

To test the library, make sure you have an up-to-date OpenMath.pgf. You can recreate it by issuing:

     make om


and then, on the test directory:

     ./tbm table


That will make a table indexed by treebank entry and testing language (English, German an Spanish), showing the number of differences between the actual linearization and the corrected one.

Each time a new revision is committed to the repository, the output of this command is saved into test/table. Comparing different revisions of this file allows to measure the progress of the bug-fixing effort.

To review the current defects for language L:

     ./tbm review -lL


It will walk all the defects showing the differences, the stored corrected concretes, the abstract and the current linearization. For a list of available sub-commands press

h

.

# 6. Acknowledgments

Krasimir Angelov, Olga Caprotti, Ramona Enache, Thomas Hallgren, Alba Hierro, Inari Listenmma, Aarne Ranta, Ares Ribo, Adam Slaski, Shafqat Virk and Sebastian Xambó.

# A1. Current differences

## English

• (46) there {is → exists} x in C such that y divides x
• (84) is x equal to x{→ ?}
• (87) {it is →} not {true that →} p
• (115) the set {with → whose} elements{→ are}y and z

## German

• (0) der inverse hyperbolische Kosinus des Produkts über {Gamma → Gamma,} wobei x von der Differenz von x und {von →} y bis dem Arcuskosinus von z läuft
• (5) der absolute Wert des Bruches {von →} x über z
• (25) das kartesische Produkt von A und {von →} B
• (26) die Summe über x{→ aufgerundet,}wobei x {über →} die Menge B durchläuft {aufgerundet →}
• (28) die komplexe Zahl mit polaren Koordinaten dem Quadrat von z und dem Produkt über {z → z,} wobei z von x bis y läuft
• (38) das Integral des Arcussinus {auf dem → über das} Intervall {aus → von} dem Kubus von Pi {nach → bis} minus Gamma
• (46) es gibt x in C so dass{→ x}y {durch x dividiert → teilt}
• (53) für alle z {, → gilt} p
• (54) für alle z in A {, → gilt} p
• (55) der größte gemeinsame Teiler von x und {von →} y
• (61) der Durchschnitt von A und {von →} B
• (62) die {Funktion aus → Funktion, welche} y {nach der → auf die} Differenz von y und {von →} z{→ abbildet}
• (63) das kleinste gemeinsame Vielfache von y und {von →} z
• (65) die {Links-Inverse → linksinverse} Funktion der {Rechts-Inversen → rechtinversen} Funktion des hyperbolischen Kosinus
• (69) die Menge {Werte → von Werten} der Form der Fakultät von {x → x,} so dass y in x in {A → A,} so dass r ist
• (74) das maximale Element der Differenz von A und {von →} B
• (75) der Mittelwert von z und {von →} y
• (76) der Median von x , {von →} y und {von →} z
• (78) die Differenz der ganzzahligen Division von x und {von →} z und der Summe über {Pi → Pi,} wobei x {über →} die Menge A durchläuft
• (85) der Modus von x , {von →} y und {von →} z
• (86) das siebte Moment von x , {von →} y und {von →} z {an der → über die} Differenz von Pi und {von →} x
• (87) es ist nicht {wahr → wahr,} daß p
• (96) die Summe von x und {von →} z
• (97) y hoch die Summe über {z → z,} wobei z von x bis y läuft
• (99) der Kubus des {Produkts über z → Produktes von z,} wobei z von x bis dem Kosekans von x läuft
• (100) das Produkt über der Quadratwurzel von {z → z,} wobei z {über die Menge →} das linksseitige {geschlossene → abgeschlossene} Intervall von Gamma bis y durchläuft
• (101) das{→ stetige}Intervall von Pi bis x ist eine echte Teilmenge des Definitionsbereiches des {Kosekans → Kosecans}
• (102) die ganzzahlige Division von x und der Summe über {z → z,} wobei z von dem Argument von z bis dem Rest von x dividiert durch y läuft
• (105) der Rest der ganzzahligen Division von Gamma und {von →} z dividiert durch Pi
• (107) die {Rechts-Inverse → rechtinverse} Funktion der Ableitung des Tangens
• (112) die standarde Abweichung von y und {von →} z
• (116) der Quotient von x und {von →} Pi ist ein Element des {geschlossenen → abgeschlossenen} Intervalls von x bis z
• (118) die Differenz der Differenz von A und {von →} B und des offenen Intervalls von z bis y
• (127) die Größe des linksseitigen {geschlossenen → abgeschlossenen} Intervalls von y bis x
• (130) die Summe über der fünften Wurzel von {x → x,} wobei x {über die Menge →} das ganzzahlige Intervall von z bis Pi durchläuft
• (133) das Produkt von x und {von →} y
• (136) die Summe über {y → y,} wobei y von der Kubikwurzel von x bis der inversen Zahl von Pi {läuft → läuft,} abgeschnitten
• (138) die Vereinigung von A und {von →} B
• (139) die Varianz von z und {von →} x
• (143) das vektorielle Produkt des {Vektoren → Vektors} mit einzigen Komponente Gamma und {von →} v

## Spanish

• (5) el valor absoluto de la fracción {de →} x entre z
• (25) el producto cartesiano de A y {de →} B
• (26) el redondeo hacia arriba del sumatorio de x cuando x varía en {los elementos de →} B
• (27) el número complejo con coordenadas cartesianas la fracción {de →} z entre y y el truncamiento de e
• (37) la integral de la arcosecante sobre el intervalo abierto por la izquierda {de → desde} z {a → hasta} y
• (38) la integral del arcoseno {sobre → desde} el {intervalo del →} cubo de pi {al → hasta el} opuesto de Gama
• (42) el cociente {de → entre} x y {de →} la raíz cuadrada de y
• (46) {hay → existe} x en C tal que y divide a x
• (55) el máximo común divisor de x{→ e}y {de y →}
• (61) la intersección de A y {de →} B
• (62) la función {de → desde} y {a → hasta} la diferencia entre y {y → e} z
• (63) el mínimo común múltiplo de y y {de →} z
• (70) la matriz con una fila {con → de} componentes x e y y una fila {con → de} componentes y y x
• (71) la matriz con una fila {con → de componente} única {componente →} x
• (72) una fila {con → de} componentes x e y
• (73) una fila {con → de componente} única {componente →} el elemento máximo de A
• (74) el elemento máximo de la diferencia de A y {de →} B
• (75) la media de z{→ e}y {de y →}
• (76) la mediana de x , {de →} y y {de →} z
• (77) el elemento mínimo del intervalo cerrado {de → desde} x {a → hasta} y
• (78) la diferencia entre la división entera de x {y de → entre} z y el sumatorio de pi cuando x varía en {los elementos de →} A
• (82) el intervalo abierto {de → desde} x {a → hasta} y no es un subconjunto propio de A
• (83) el intervalo abierto por la izquierda {de → desde} y {a → hasta} x no es un subconjunto del intervalo abierto {de → desde} x {a → hasta} y
• (85) la moda de x , {de →} y y {de →} z
• (86) el séptimo momento de x , {de →} y y {de →} z en la diferencia entre pi y x
• (96) la suma de x y {de →} z
• (100) el producto de la raíz cuadrada de z cuando z varía en {los elementos del → el} intervalo cerrado por la izquierda {de → desde} Gama {a → hasta} y
• (101) el intervalo {de → continuo desde} pi {a → hasta} x es un subconjunto propio del dominio de la cosecante
• (102) la división entera de x {y del → entre el} sumatorio de z cuando z varía desde el argumento de z hasta el resto de x dividido por y
• (105) el resto de la división entera de Gama {y de → entre} z dividida por pi
• (112) la desviación estándar de y y {de →} z
• (115) el conjunto {compuesto por los → de} elementos y y z
• (116) el cociente {de → entre} x y {de →} pi es un elemento del intervalo cerrado {de → desde} x {a → hasta} z
• (118) la diferencia {de → entre} la diferencia {de → entre} A y {de →} B y {del → el} intervalo abierto {de → desde} z {a → hasta} y
• (127) el cardinal del intervalo cerrado por la izquierda {de → desde} y {a → hasta} x
• (128) el intervalo abierto {de → desde} y {a → hasta} x es un subconjunto del intervalo abierto por la izquierda {de → desde} x {a → hasta} y
• (129) z en {el → un} conjunto {con único elemento → de componente} x tal que r
• (130) el sumatorio de la raíz quinta de x cuando x varía en {los elementos del → el} intervalo entero {de → desde} z {a → hasta} pi
• (133) el producto de x{→ e}y {de y →}
• (138) la unión de A y {de →} B
• (139) la varianza de z y {de →} x
• (140) el vector {con → de} componentes y y x
• (141) el vector {con → de componente} única {componente →} el cardinal de A
• (143) el producto vectorial del vector {con → de componente} única {componente →} Gama y {de →} v

# A2. RGL tickets for the MGL

### #2: ! with imperative (completed)

Imperative mode forces "!" at the end?

Not what we want for exercises.

Test> l DoComputeF DefineV (Var2Fun f)

define f !

### #21: x {hoch,gleigh} y (completed)

We want to express:

"x gleich y"

or

"x hoch y"

It doesn't exist

### #40: gilt/holds -- gilt nicht/does not hold (open)

Example

for all z , r , it isn't true that r and if p , then , it isn't true that r

für alle z, r , ist es nicht wahr daß r und wenn p dann ist es nicht wahr daß r

I think it would be better to write "gilt", "gilt nicht" (english "holds", "it does not hold") instead of "es ist nicht wahr", "es ist wahr", "it is true", "it isn't true":

for all z , r , r does not hold and if p , then , r does not hold

für alle z, r , r gilt nicht und wenn p dann r gilt nicht

### #54: there is/exists (46) (wont-fix)

#### Abstract

exist (BaseVarNum x) (Var2Set C) (mkProp (divides (Var2Num y) (Var2Num x)))

#### Difference

there {is → exists} x in C such that y divides x

### #56: set of values of the form ... (69) (completed)

#### Abstract

map y (factorial (Var2Num x)) (suchthat (Var2Set A) x r)

#### Difference

the set of values of the form {the →} factorial of x {such that → for} y {is → ranging} in{→ the set of elements}x {in → of} A such that r

### #62: set whose elements (115) (wont-fix)

#### Abstract

set (BaseValNum (Var2Num y) (Var2Num z))

#### Difference

the set {of components → whose elements are} y and z

### #67: hay → existe , divida (wont-fix)

l exist (BaseVarNum x) (Var2Set C) (mkProp (divides (Var2Num y) (Var2Num x)))

hay x en C tal que y divida a x

hay→existe

divida → divide

### #74: part_Prep before vowel in Cat (assigned)

el conjunt amb element únic el cub de pi

DefNPwithbaseElem : CN → MathObj → MathObj =

\cn,o → DefSgNP (mkCN cn (prepAdv with_Prep (mkNP (mkCN (mkCN (mkA "únic") element_CN) o)))) ;

Problema:

No puc escriure "d'element únic" perquè si canvio el with_Prep per un possess_Prep o un part_Prep (of) , omet la preposició! Perquè?

### #78: de a y b, no de a y de b (25) (wont-fix)

#### Abstract

cartesian_product (BaseValSet (Var2Set A) (Var2Set B))

#### Difference

el producto cartesiano de A y {de →} B

### #88: Imaginärteil (58) (completed)

#### Abstract

imaginary (Var2Num y)

#### Difference

der {imaginäre Teil → Imaginärteil} von y

### #90: kleinstes gemeinsames Vielfaches (63) (wont-fix)

#### Abstract

lcm (BaseValNum (Var2Num y) (Var2Num z))

#### Difference

das {am wenigstene gemeine → kleinstes gemeinsames} Vielfaches von y und z

### #96: dem reele Teil (109) (completed)

#### Abstract

root2 (real (Var2Num x))

#### Difference

die quadratische Wurzel von dem {reellen → reele} Teil von x

### #99: and_Conj "y" in spanish does not include the case "e" (wont-fix)

Problem: and_Conj in spanish does not include the case "e", for example for "x e y". It should be

and_Conj = {s1 = [] ; s2 = etConj.s ; n = Pl} ;

For the moment, we have created a new

myAnd_Conj=and_Conj;
at MathI.gf and redefined it as
and_Conj = {s1 = [] ; s2 = etConj.s ; n = Pl} ;
at MathSpa.gf

This should be fixed at StructuralSpa.gf

# D6.2 Prototype of comanding CAS

Contract No.: FP7-ICT-247914 MOLTO - Multilingual Online Translation D6.2. Prototype of comanding CAS Public M23 February 2012 Prototype Final (evolving document) Jordi Saludes, Ares Ribó UPC

## Abstract

The present paper is the cover of deliverable D6.2 as of WP6. It gives description and installation instructions for the executables included in this deliverable.

# Dependencies

The following table describes whats is needed in order to use the executables. In all case you'll need GF and Sage.

gfsage is the simple dialog executable, shell denotes the component that allow using natural language inside Sage and shell-complete is the same with auto-completion of commands.

Component O. S. Extra requirement Spoken output autocompletion
gfsage Mac OS X, Linux Ubuntu ghc, curl OSX1, Linux yes
shell all2 no
shell-complete Linux gf python bindings yes

1. 10.7

2. Not tested on Windows, but in this case Sage runs inside of a Linux virtual box.

# Installation

Depending on your permission settings you might have to run some of these command as sudo. For all of these first you have to checkout the Mathematics Grammar Library from:

svn co svn://molto-project.eu/mgl


Be warned that develoopment will continue for some time in this HEAD branch. For a frozen version of it, checkout from:

svn co svn://molto-project.eu/tags/D6.2


You'll find detailed instructions for installing each executable in the following pages. For the moment, note that it is necessary to modify some files in your Sage files, for these executables to run. Usually, we have to make these changes just once: The first time, the installation procedure will warn you about it:

Please add 'sage.nlgf' to /usr/local/sage-4.7.2/devel/sage/setup.py


Since ours is not a regular Sage package, we must add a package reference manually by tweaking setup.py given above (Notice that yours may have a different path). This is a python file that Sage reads to configure the system using the command setup. Please find it in the file, mine is at line 882 and looks like this:

code = setup(name = 'sage',


The setup command lists several items; Please locate packages (which is a python list) and add 'sage.nlgf' (quotes included) among the other packages listed there. Python is picky about indentation and doesn't like to have spaces and tabs mixed. Please check that you're using the same spacing as the rest of the file.

The installation has been tested on Sage 4.7.1, 4.7.2 and 4.8

# gfsage: a natural language interface for Sage

The goal of this work is to develop a command-line tool able to take commands in natural language and have them executed by Sage, a collection of Computer Algebra packages presented in a uniform way. We present here instructions on how to build the interface and examples of its intended use.

## Building the executable

You'll need:

• ghc with cabal, as in Haskell platform
• curl
• a way to call Sage on a terminal (usually sage command. It assumes it's in your PATH)
• A POSIX system
• The source version of GF.

You can get this source version by:

cabal install gf


We can install the other dependencies too by:

cabal install json curl


Checkout the mathematics grammar library from:

 svn co svn://molto-project.eu/mgl


This is the active branch. For the fixed one use:

svn co svn://molto-project.eu/tags/D6.2


Go into the mgl/sage directory (D6.2/sage if you're using the fixed branch) and make it:

cd mgl/sage
make


The first time you make it will fail, asking you to make modifications in the Sage installation. Please refer to the installation page.

Now try to build gfsage again. All these build operations will ask Sage to "rebuild" itself. Be warned that the first rebuild takes some time:

make


The system as been tested in Mac (OS X 10.7) and Linux (Ubuntu).

## Usage

Run the tool as:

./gfsage english


giving the input language as argument. It will take some seconds to start the server. After that it will reply with some server information and will show the prompt:

    sage>


You can then enter your query:

    sage> compute the product of the octal number 12 and the binary number 100.
(3) 40


To show that a CAS is actually behind the scene, let's try something symbolic:

    sage> compute the greatest common divisor of x and the product of x and y.
(4) x


and compare it with:

    sage> compute the greatest common divisor of x and the sum of x and y.
(5) 1


Sage does the right thing in both cases, x and y being unbound numeric variables.

    sage> compute the second iterated derivative of the cosine at pi.
(6) 1


## Exiting

Exit the session by issuing CRTL+D: This way the server exits cleanly.

Just another example in a different language:

    ./gfsage spanish
Login into localhost at port 9000
Session ID is c1ef10dfd49e4fdb3214fa6d3a3b9c92
waiting... EmptyBlock 2
finished handshake. Session is c1ef10dfd49e4fdb3214fa6d3a3b9c92
sage> calcula la parte imaginaria  de la derivada de la exponencial en pi.
(4) 0


More recent examples involving integer literals and integration:

    sage> compute the sum of 1, 2, 3, 4 and 5.
(3) 15

sage> compute the summation of x when x ranges from 1 to 100.
(4) 5050

sage> compute the integral of the cosine from 0 to the quotient of pi and 2.
waiting... (5) 1

sage> compute the integral of the function mapping x to the square root of x from 1 to 2.
(6) 4/3*sqrt(2) - 2/3
answer: it is 4 over 3 times the square root of 2 minus the quotient of 2 and 3 .


## Other invocation options

Use english:

gfsage


Use LANGUAGE:

gfsage LANGUAGE


General invocation:

gfsage [OPTIONS]


where OPTIONS are:

short form long form description
-h --help Print usage page
-i LANGUAGE --input-lang=LANGUAGE Make queries in LANGUAGE
-o LANGUAGE --output-lang=LANGUAGE Give answers in LANGUAGE
-V LEVEL --verbose=LEVEL Set the verbosity LEVEL
-t FILE --test=FILE Test samples in FILE
-v[VOICE] --voice[=VOICE] Use voice output. To list voices use ? as VOICE.
-F --with-feedback Restate the query when answering.

## Limitations

• On Darwin (OS X 10.6 and 10.7) a bug in the Sage part makes the system unresponsive after some computations (between 7 and 10)
• On some machines, it takes time for the Sage server to respond.

This condition is signaled by the message:

gfsage: Connecting CurlCouldntConnect


I used a Linux virtual machine to reproduce this condition and find that, sometimes, it takes about 10 retries for the server to catch, but then it stays running ok for hours. My guess is that is related to some timeout limit in the server. Killing the orphaned python processes from the previous retries might help too (killall python).

# Realsets

realsets.py is a Sage module to support subsets of the real field consisting of intervals and isolated points and was developed to demonstrate set operations of the MGL Set1 module.

It is based of previous work from Interval1Sage adding integration on real sets and real intervals.

An object in this module consists of a list of disjoint open intervals plus a list of isolated points (not belonging to these intervals). Notice that Infinite is acceptable as interval bound. Therefore, one can define:

• All sort of real intervals: open, close and half-open
• Finite sets
• Unbounded intervals
• And combinations of these by union, intersection and taking complements.

Represent a set that can be the union of some intervals and isolated points. It consists of:

• A list of disjoint open non-empty intervals.
• A list of points. Each of these points belongs at most to one interval.

## Examples

A closed interval:

? RealSet.cc_interval(1,4);
[ 1 :: 4 ]


A single point:

? RealSet.singleton(1)
{1}


#### Union

Union is supported with intervals and can be nested :

? I = RealSet.co_interval(1, 4)
? J = RealSet.co_interval(4, 5)
? M = RealSet.oc_interval(7, 8)
? I.union(J).union(M)
[ 1 :: 5 [ ∪ ] 7 :: 8 ]


#### Intersection

? I.intersection(J)
()
? I.intersection(RealSet.cc_interval(2,5))
[ 2 :: 4 [


#### Queries

Is a point in the set?

? I = RealSet.oo_interval(1, 3)
? 2 in I
True
? 3 in I
False


Is a set discrete (i.e: does not contain intervals)?

? RealSet.oo_interval(0,1).discrete
False
? RealSet(points=(1,2,3)).discrete
True


Size of a discrete is the number of points:

? RealSet(points=range(5)).size
5
? RealSet.oo_interval(0,3).size
+Infinity


A is subset of B

? A = RealSet.oo_interval(0,1)
? B = RealSet.cc_interval(0,1)
? RealSet().subset(A)
True
? B.subset(A)
False
? A.subset(B)
True
? A.subset(A)
True
? A.subset(A, proper=True)
False


Return the infimum (greatest lower bound)

? RealSet(points=range(3)).infimum()
0
? RealSet.oo_interval(1,3).infimum()
1


The opposite of a set: –A = {-x | x ∈ A}

? -RealSet.oo_interval(1,2)
] -2 :: -1 [


Return the supremum (least upper bound)

? RealSet(points=range(3)).supremum()
2
? RealSet.oo_interval(1,3).supremum()
3


The complementary of a set:

? RealSet.oo_interval(2,3).complement()
] -Infinity :: 2 ] ∪ [ 3 :: +Infinity [
? RealSet(points=range(3)).complement()
] 0 :: 1 [ ∪ ] 1 :: 2 [ ∪ ] 2 :: +Infinity [ ∪ ] -Infinity :: 0 [


The set difference of A and B: \{x \in A, x\notin B\}

? I = RealSet.oo_interval(2,+Infinity)
? J = RealSet.oo_interval(-Infinity, 5)
? I.setdiff(J)
[ 5 :: +Infinity [
? J.setdiff(I)
] -Infinity :: 2 ]


# gfsage internal workings

gfsage is a prototype to demonstrate two-way natural language communication between a user and a Sage system.

When you invoke the gfsage command interactively:

• A Sage process is started in the background, listening for incoming http requests;
• A GF pgf module is read and set to mediate between the user and the Sage process;

The details of these components are given below.

## The GF side

A GF module acts as a post office translating messages between the different parties (nodes) composing a dialog. This section is more a description of a proposed design strategy for a generic postoffice interface based on GF. The actual code implements ideas of this design, but, for instance, it contains no edges or nodes as explicit entities.

### Nodes and edges

gfsage deals with just 2 agents:

1. The user
2. The Sage system

in the case whether the input language is different of the output language, we may consider a third node (the output user).

There is a unique pgf module containing all GF information for the dialog system to work: Commands.pgf. Each node has a language (a GF concrete module) assigned: the user uses a natural language (i.e., ComandsEng for English).

A node reacts to received messages by sending a reply. The chain of messages between two nodes is called a dialog. An active node as the user can start a dialog by sending a message. A passive node, like the Sage system here, just replies to the received messages.

• A regular message from another node: This is a GF linearization in the receptor language.
• A no_parse message from the postoffice telling that a previous outgoing message cannot be parsed.
• An is_ambiguous message from the postoffice related to a previous message sent by the node, specifying that it was ambiguous and carrying additional info for the node to decide among the possible meanings. To respond to this, the node must send a disambiguate message to the postoffice (see below).

A node can send:

• A regular message to another node: This is a parseable string for the emitter language.
• A disambiguate message sent in response to an ambiguous message. In this message the node chooses one of the options or aborts the transaction.

A regular message between two given nodes corresponds to a fixed GF category. In the case of gfsage it is Command for messages traveling from User to Sage and Answer for messages going the other way.

### Up and Down pipeline

A regular message from node N1 to node N2 goes through the following steps:

1. Input string is lexed, that is: separated into parse-able units (tokens);
2. It is then parsed using the node N1 language and edge category (i.e. node N1 to node N2) into a set of GF abstract trees;
3. This set is, hopefully, reduced by paraphrasing the trees and removing duplicates (it is the compute step);
4. Now, If the resulting set is empty, a no_parse message is sent back to the sending node. If it contains more than one entry, an is_ambiguous message is sent. In the previous cases, the process stops here; Only when the computed set contains just an entry, is this pushed downstream to the node N2.
5. The abstract tree is linearized using the node N2 language;
6. The result is unlexed, that is: assembled into a string that is delivered to the receiving node.

## The Sage side

For Sage to work alongside GF, we need a http sever listening to Sage commands and some scripts to set up the environment and respond to the type of queries that can be expressed in the Mathematics Grammar Library, MGL.

### The Sage server

A Sage process is started in the background by the start-nb.py script in -python mode. This script starts a Sage notebook, as described in Simple server API, listening on port 9000 and up to requests in http format. It also installs a handler for cleanly disposing of the notebook object whenever the parent process terminates.

The parent process sends then an initial request to load some functions and variables that we'll need in the dialog system defined in prelude.sage and goes into the main evaluation loop.

### Sage scripts

realsets.py
is a Sage module developed to support set operations as described in the Set1 module of the MGL. (See the page about it)
prelude.sage
defines Sage functions to implement derivation on the style of the MGL and state storing for numbers, sets, functions and sets to support anaphora in the dialog.

# Adding voice output to gfsage

## Description

OS X has voice output buit-in, usable from the shell by way of the say command. You can use several voices in English or download more for other languages.

## Usage

1. You must build the system on mgl/sage as described previously.
2. Check that you have at least one voice for your prefered languages: Go to System Preferences > Speech and click on System Voice
3. See that you have the right ones. If not, click Customize on the pop-up
4. Select the ones for you and click Ok. When downloading terminates, you may run the tool.
5. You can call gfsage in 3 different ways, but for voiced output you must use the one with OPTIONS:
     gfsage Use english
gfsage LANGUAGE Use this language
gfsage [OPTIONS] where OPTIONS are:
-i INPUT --input-lang=INPUT Make queries in LANGUAGE
-o OUTPUT --output-lang=OUTPUT Give answers in LANGUAGE
-v[VOICE] --voice[=VOICE] use voice output. To list voices use ? as VOICE.
-F --with-feedback Restate the query when answering.


The options relevant here are -v and -F. Use the first to select voice output. With no argument it will pick the first available voice for the OUTPUT voice selected:

./gfsage -i english -v
Voiced by Agnes


... It will use Agnes as English voice. Notice that if you do not give a -o option, the OUTPUT language is assume to be the same as the INPUT language.

To list the available voices use:

./gfsage -i english -v?
Agnes, Albert, Alex, Bahh, Bells, Boing, Bruce, Bubbles, Cellos, Daniel, Deranged, Fred, Hysterical, Junior, Kathy, Princess, Ralph, Trinoids, Vicki, Victoria, Whisper, Zarvox


It will list the English voices. To use a specific voice write:

./gfsage -i german -vYannick
Voiced by Yannick


The option -F is to make the system paraphrase your query on answering. First, get a simple answer:

./gfsage -i english
Login into localhost at port 9000
waiting... EmptyBlock 3
sage&gt; compute the factorial of 5.
(4) 120


... and now the same with paraphrasing:

./gfsage -i english -F
Login into localhost at port 9000
Session ID is 88549994a28940fe0657eb9e506a5e84
waiting... EmptyBlock 3
sage&gt; compute the factorial of 5.
(4) 120
answer: the factorial of 5 is 120 .


So, to experience voice output in its full glory you have to use both -v and -F.

To help with regression testing I recently added a test option to gfsage for batch-testing the system by reading dialog samples from a file.

The samples must be in a text file and consist in a sequence of dialogs which are sequences of query/responses to the Sage system. Notice that a dialog might carry a state in the form of assumptions that are asserted or variables that are assigned. In the same way, each dialog is completely independent of the others.

Each dialog starts with a BEGIN or BEGIN language line. It specifies the beginning of dialog triplets and the natural language for these triplets. The dialog runs until an END line. The language specified becomes the current language. Dialogs with no given languages are assumed to be in the current language. At the start of a testing suite, the current language is English.

A triplet is a sequence of 3 lines:

• The query passed to Sage in the current language
• The Sage response in sage language
• This response translated to the current language.

### Example of a test suite

BEGIN spanish
calcula el factorial del número octal 11.
362880
es 36280 .
END
BEGIN english
let x be 4 .

compute the sum of x and 5 .
9
it is 9 .
compute the sum of it and 5 .
14
it is 14 .
END


Notice that blank lines are relevant: they mark that Sage responded nothing to the query. Therefore, it is not allowed to insert blank lines neither between triplets nor dialogs.

### Usage

gfsage --test

will test the dialogs in and tell about the differences. You got a summary of the results:

Dialog 'compute Gamma....' failed
18 out of 19 dialogs successful.


# Using natural language inside Sage

By defining new Sage interfaces we can command the Sage shell and notebook server using natural language.

# Installation

Move to the sage directory and build sage-shell:

cd mgl/sage
make sage-shell


The first time you build it, you may run into a warning as in the installation section of the front page, or:

Please add nlgf components to the interfaces list in /usr/local/sage-4.7.2/devel/sage/sage/interfaces/all.py


We must inform Sage that there are some new interfaces for it: We open interfaces/all.py (Notice that your actual path might be different), go to the end of the file and add something like this:

from nlgf import english, spanish
interfaces.extend(['english', 'spanish'])


The first line asks the system to load the interfaces for commanding Sage using English and Spanish. The next line add these to the list of available interfaces.

Now retry building:

make sage-shell


At the time of writing, the module nlgf provides catalan, english, german, and spanish interfaces.

### Sage shell with command auto-completion

In some systems you can have the commands Sage shell auto-completed by pressing the tab key. This is experimental and you have to make the installation completely by hand.

First you have to build the Python bindings for GF which, for the moment, only work in Linux. You'll find there a shared library called gf.so. Copy or move it into one of the directories that Python scans when resolving imports. Note that it may be the case that the Python instance run by Sage be different of the one your machine runs by default; To be sure, do as follows:

sage -python -c 'import sys; print sys.path'


it will list all the directories that Sage/python scans.

You'll know it's all right when:

sage -python -c 'import gf'


exits with no complain: The next time you enter into the Sage shell you'll have autocompletion for the GF interfaces.

# Usage

## Shell interface

Start a Sage shell:

sage


and switch to one of the defined natural language interfaces:

sage: %english


--&gt; Switching to Gf &lt;--


If you didn't install autocompletion (which is the usual case, auto-completion being experimental), a warning will appear:

No autocompletion available


Now you're ready to issue sage commands in English:

english: compute the summation of x when x ranges from 1 to 100.
5050
5053
english: let x be the factorial of 6.
720
english: let y be the factorial of 5.
120
english: compute the greatest common divisor of x and y.
120
english: compute the least common multiple of x and y.
720


Go back to the standard interface by typing ctrl+D or typing quit.

## Notebook interface

Sage has a notebook interface that gives a more flexible way to interact with it. To use it, start the shell as above and then:

sage: notebook(secure=true, interface='')
The notebook files are stored in: sage_notebook.sagenb
****************************************************
*                                                  *
*                                                  *
****************************************************
There is an admin account.  If you do not remember the password,
quit the notebook and type notebook(reset=True).
2012-02-13 12:48:19+0100 [-] Log opened.
...


In some systems a browser will open simultaneously. Now you can use Sage from the browser.

Click on New Worksheet. You'll be asked to rename the worksheet (this is optional). A single cell will be ready for your input. Write your command and press evaluate. Notice that a cell can contain more than one command, separated by newlines.

Start a new cell by writing:

%english


and add one or more new lines with commands in English.

AttachmentSize
sage-notebook.jpg95.69 KB

# D10.2 MOLTO web service, first version

Contract No.: FP7-ICT-247914 MOLTO - Multilingual Online Translation D10.2 MOLTO web service, first version Public M3 2 June 2010 Prototype Final Krasimir Angelov, Olga Caprotti, Ramona Enache, Thomas Hallgren, Inari Listenmaa, Aarne Ranta, Jordi Saludes, Adam Slaski UGOT UPC, UHEL

### Abstract

This phrasebook is a program for translating touristic phrases between 14 European languages included in the MOLTO project (Multilingual On-Line Translation): Bulgarian, Catalan, Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Polish, Romanian, Spanish, Swedish. A Russian version is not yet finished but will be added later. Also other languages may be added.

The phrasebook is implemented by using the GF programming language (Grammatical Framework). It is the first demo for the MOLTO project, released in the third month (by June 2010). The first version is a very small system, but it will extended in the course of the project.

The phrasebook is available as open-source software, licensed under GNU LGPL, at http://code.haskell.org/gf/examples/phrasebook/.

</br/></p/>

# 1. Purpose

The MOLTO phrasebook is a program for translating touristic phrases between 14 European languages included in the MOLTO project (Multilingual On-Line Translation):

• Bulgarian, Catalan, Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Polish, Romanian, Spanish, Swedish. A Russian version is not yet finished but is projected later. Other languages may be added at a later stage.

The phrasebook is implemented in the GF programming language (Grammatical Framework). It is the first demo for the MOLTO project, released in the third month (by June 2010). The first version is a very small system, but it will be extended in the course of the project.

The phrasebook has the following requirement specification: - high quality: reliable translations to express yourself in any of the languages - translation between all pairs of languages - runnable in web browsers - runnable on mobile phones (via web browser; Android stand-alone forthcoming) - easily extensible by new words (forthcoming: semi-automatic extensions by users)

The phrasebook is available as open-source software, licensed under GNU LGPL. The source code resides in ftp://code.haskell.org/gf/examples/phrasebook/

# 2. Points Illustrated

We consider both the end-user perspective and the content producer perspective.

## From the user perspective

• Interlingua-based translation: we translate meanings, rather than words
• Incremental parsing: the user is at every point guided by the list of possible next words
• Mixed input modalities: selection of words ("fridge magnets") combined with text input
• Quasi-incremental translation: many basic types are also used as phrases, one can translate both words and complete sentences, and get intermediate results
• Disambiguation, esp. of politeness distinctions: if a phrase has many translations, each of them is shown and given an explanation (currently just in English, later in any source language)
• Feed-back from users: users are welcome to send comments, bug reports, and better translation suggestions

## From the programmer's perspective

• The use of resource grammars and functors: the translator was implemented on top of an earlier linguistic knowledge base, the GF Resource Grammar Library
• Example-based grammar writing and grammar induction from statistical models (Google translate): many of the grammars were created semi-automatically by generalization from examples
• Compile-time transfer especially, in Action in Words: the structural differences between languages are treated at compile time, for maximal run-time efficiency
• The level of skills involved in grammar development: testing different configurations (see table below)
• Grammar testing: use of treebanks with guided random generation for initial evaluation and regression testing

# 3. Files

The phrasebook is available as open-source software, licensed under GNU LGPL. The source code resides in http://code.haskell.org/gf/examples/phrasebook/. Below a short description of the source files.

## Grammars

• Sentences: general syntactic structures implementable in a uniform way. Concrete syntax via the functor SencencesI.
• Words: words and predicates, typically language-dependent. Separate concrete syntaxes.
• Greetings: idiomatic phrases, string-based. Separate concrete syntaxes.
• Phrasebook: the top module putting everything together. Separate concrete syntaxes.
• DisambPhrasebook: disambiguation grammars generating feedback phrases if the input language is ambiguous.
• Numeral: resource grammar module directly inherited from the library.

The module structure image is produced in GF by

    > i -retain DisambPhrasebookEng.gf
> dg -only=Phrasebook*,Sentences*,Words*,Greetings*,Numeral,NumeralEng,DisambPhrasebookEng
> ! dot -Tpng _gfdepgraph.dot > pgraph.png


## Ontology

The abstract syntax defines the ontology behind the phrasebook. Some explanations can be found in the ontology document, which is produced from the abstract syntax files Sentences.gf and Words.gf by make doc.

## Run-time system and user interface

The phrasebook uses the PGF server written in Haskell and the minibar library written in JavaScript. Since the sources of these systems are available, anyone can build the phrasebook locally on her own computer.

# 4. Effort and Cost

Based on this case study, we roughly estimated the effort used in constructing the necessary sources for each new language and compiled the following summarizing chart.

Language Language skills GF skills Informed development Informed testing Impact of external tools RGL Changes Overall effort
Bulgarian ### ### - - ? # ##
Catalan ### ### - - ? # #
Danish - ### + + ## # ##
Dutch - ### + + ## # ##
English ## ### - + - - #
Finnish ### ### - - ? # ##
French ## ### - + ? # #
German # ### + + ## ## ###
Italian ### # - - ? ## ##
Norwegian # ### + - ## # ##
Polish ### ### + + # # ##
Romanian ### ### - - # ### ###
Spanish ## # - - ? - ##
Swedish ## ### - + ? - ##

Legend

Language skills

• - : no skills
• # : passive knowledge
• ## : fluent non-native
• ### : native speaker

GF skills

• - : no skills
• # : basic skills (2-day GF tutorial)
• ## : medium skills (previous experience of similar task)
• ### : advanced skills (resource grammar writer/substantial contributor)

Informed Development/Informed testing

• - : no
• + : yes

Impact of external tools

• ?: not investigated
• - : no effect on the Phrasebook
• # : small impact (literal translation, simple idioms)
• ## : medium effect (translation of more forms of words, contextual preposition)
• ### : great effect (no extra work needed, translations are correct)

RGL changes (resource grammars library)

• - : no changes
• # : 1-3 minor changes
• ## : 4-10 minor changes, 1-3 medium changes
• ### : >10 changes of any kind

Overall effort (including extra work on resource grammars)

• # : less than 8 person hours
• ## : 8-24 person hours
• ### : >24 person hours

# 5. Example-based grammar writing process

The figure presents the process of creating a Phrasebook using an example-based approach for a language X, in our case either Danish, Dutch, German, Norwegian, for which we had to employ informed development and testing by a native speaker, different from the grammarian.

Remarks : The arrows represent the main steps of the process, whereas the circles represent the initial and final results after each step of the process. Red arrows represent manual work and green arrows represent automated actions. Dotted arrows represent optional steps. For every step, the estimated time is given. This is variable and greatly influenced by the features of the target language and the semantic complexity of the phrases and would only hold for the Phrasebook grammar.

Initial resources :

• English Phrasebook
• resource grammar for X
• script for generating the inflection forms of words and the corresponding linearizations of the lexical entries from the Phrasebook in the language X. For example, in the case of the nationalities, since we are interested in the names of countries, languages and citizenship of people and places, we would generate constructions like "I am English. I come from England. I speak English. I go to an English restaurant" and from the results of the translation we will infer the right form of each feature. In English, in most cases there is an ambiguity between the name of the language and the citizenship of people and places, but in other languages all three could have completely different forms. This is why it is important to make the context clear in the examples, so that the translation will be more likely to succeed. The correct design of the test of examples, is language dependent and assumes analysis of the resource grammar, also. For example, in some languages we need only the singular and the plural form of a noun in order to build its GF representation, whereas in other languages such as German, in the worst case we would need 6 forms which need to be rendered properly from the examples.
• script for generating random test cases that cover all the constructions from the grammar. It is based on the current state of the abstract syntax and it generates for each abstract function some random parameters and shows the linearization of the construction in both English and language X, along with the abstract syntax tree that was generated.
Step 1 : Analysis of the target grammar

The first step assumes an analysis of the resource grammar and extracts the information needed by the functions that build new lexical entries. A model is built so that the proper forms of the word can be rendered, and additional information, such as gender, can be inferred. The script applies these rules to each entry that we want to translate into the target language, and one obtains a set of constructions.

Step 2 : Generation of examples in the target language

The generated constructions are given to an external translator tool (Google translate) or to a native speaker for translation. One needs the configuration file even if the translator is human, because formal knowledge of grammar is not assumed.

Step 3 : Parsing and decoding the examples with GF

The translations into the target language are further more processed in order to build the linearizations of the categories first, decoding the information received. Furthermore, having the words in the lexicon, one can parse the translations of functions with the GF parser and generalize from that.

Step 4 : Evaluation and correction of the resulting grammar

The resulting grammar is tested with the aid of the testing script that generates constructions covering all the functions and categories from the grammar, along with some other constructions that proved to be problematic in some language. A native speaker evaluates the results and if corrections are needed, the algorithm runs again with the new examples. Depending on the language skills of the grammar writer, the changes can be made directly into the GF files, and the correct examples given by the native informant are just kept for validating the results. The algorithm is repeated as long as corrections are needed.

The time needed for preparing the configuration files for a grammar will not be needed in the future, since the files are reusable for other applications. The time for the second step can be saved if automatic tools, like Google translate are used. This is only possible in languages with a simpler morphology and syntax, and with large corpora available. Good results were obtained for German and Dutch with Google translate, but for languages like Romanian or Polish, which are both complex and lack enough resources, the results are discouraging.

If the statistical oracle works well, the only step where the presence of a human translator is needed is the evaluation and feedback step. An average of 4 hours per round and 2 rounds were needed in average for the languages for which we performed the experiment. It is possible that more effort is needed for more complex languages.

Further work will be done in building a more comprehensive tool for testing and evaluating the grammars, and also the impact of external tools for machine translation from English to various target languages will be analysed, so that the process could be automated to a higher degree for the future work on grammars.

# 6. Future and ongoing work

Disambiguation
Disambiguation grammars for languages other than English are in most cases still incomplete.
Lexicon extension
The extension of the abstract lexicon in Words by hand or (semi)automatically for items related to the categories of food, places, and actions will result in immediate increase of the expressiveness of the phrasebook.
Customizable phone distribution

# 7. How to contribute

The basic things "everyone" can do are:

• complete missing words in concrete syntaxes
• add new abstract words in Words and greetings in Greetings

The missing concrete syntax entries are added to the WordsL.gf files for each language L. The morphological paradigms of the GF resource library should be used. Actions (prefixed with A, as AWant) are a little more demanding, since they also require syntax constructors. Greetings (prefixed with G) are pure strings.

Some explanations can be found in the implementation document, which is produced from the concrete syntax files SentencesI.gf and WordsEng.gf by make doc.

Here are the steps to follow for contributors:

1. Make sure you have the latest sources from GF Darcs, using darcs pull.
2. Also make sure that you have compiled the library by make present in gf/lib/src/.
3. Work in the directory gf/examples/phrasebook/.
4. After you've finished your contribution, recompile the phrasebook by make pgf.
5. Save your changes in darcs record . (in the phrasebook subdirectory).
6. Make a patch file with darcs send -o my_phrasebook_patch, which you can send to GF maintainers.
7. (Recommended:) Test the phrasebook on your local server: a. Go to gf/src/server/ and follow the instructions in the project Wiki. b. Make sure that Phrasebook.pgf is available to you GF server (see project wiki). c. Launch lighttpd (see project wiki). d. How you can open gf/examples/phrasebook/www/phrasebook.html and use your phrasebook!

Finally, a few good practice recommendations:

• Don't delete anything! But you are free to correct incorrect forms.
• Don't change the module structure!
• Don't compromise quality to gain coverage: non multa sed multum!

# 8. Conclusions (tentative)

The grammarian need not be a native speaker of the language. For many languages, the grammarian need not even know the language, native informants are enough. However, evaluation by native speakers is necessary.

Correct and idiomatic translations are possible.

A typical development time was 2-3 person working days per language.

Google translate helps in bootstrapping grammars, but must be checked. In particular, we found it unreliable for morphologically rich languages.

Resource grammars should give some more support e.g. higher-level access to constructions like negative expressions and large-scale morphological lexica.

Acknowledgments

The Phrasebook has been built in the MOLTO project funded by the European Commission. The authors are grateful to their native speaker informants helping to bootstrap and evaluate the grammars: Richard Bubel, Grégoire Détrez, Rise Eilert, Karin Keijzer, Michał Pałka, Willard Rafnsson, Nick Smallbone.

# MOLTO Phrasebook Help

The user interface is kept slim so as to also be usable from portable devices, e.g. mobile phones. These are the buttons and their functionality:

• To start: klick at a word or start typing.
• From: source language
• To: target language (either a single one or "All" simultaneously)
• Del: delete last word
• Clear: start over
• Random: generate a random phrase
• Google translate: the current input and language choice; opens in a new window or tab.

The symbol &+ means binding of two words. It will disappear in the complete translation.

The translator is slightly overgenerating, which means you can build some semantically strange phrases. Before reporting them as bugs, ask yourself: could this be correct in some situation? is the translation valid in that situation?