4. Exploitation

Comments

Robust parsing of syntax

while working on the Geography assignment, I spotted a sentence with no tense in one of the leading encyclopedias in Italy.

A web service detecting this kind of problem could be used in many places, we often think more quickly than we type and some parts of the sentence gets trapped by the keyboard.

About detection of inconsistencies in natural language

Reality checker: How to cut nonsense from the net shows an interesting application field where advanced language technologies could be applied.

Read the full article Reality checker: How to cut nonsense from the net - tech - 19 September 2012 - New Scientist.

Venice like Amsterdam or Stockholm

they belong to the world (are they UNESCO heritage sites?) and when city planners decide to add a major change to their landscape, the world should be informed. I read such a news today in the Italian papers. It made me look for the planners' renderings of the building intervention that is causing the political debate. I did not find it yet but I found a nice interface to browse past-present-future city works, alas it is not multilingual.

See, Venice is a fat boot-shape, like Italy but shorter.

I think this fits very well in the SRA of Meta-NET. It is a perfect example of a local situation which might enrage people beyond the national borders.

Semantic equivalende of multilingual washing instructions

I was about to hand wash my fancy dress with metal fiber fabric so I decided to check the washing instructions.

It is surprising to note that the Spanish has a negative description, mentioning a compound that should be avoided, while the others just say what to use. Hard to imagine what is the abstract grammar for that, my guess is that it needs quite a bit of context to work. AceWiki? :)

BTW, I would love to know if you spot similar examples in those fancy new sportswear fabrics.

About the multilingual web

How are translations of web sites contributed these days?

Check e.g. http://www.librarything.com/about-translation.php, they seem to have a few translations going.

Especially check the inspiration links, for instance Google in your Language

Improving OCR software output

I have been looking at ways of extracting marked up text and mathematics from scans of old mathematics books, in particular from A synopsis of elementary results in pure mathematics: containing propositions, formulæ, and methods of analysis, with abridged demonstrations. Supplemented by an index to the papers on pure mathematics which are to be found in the principal journals and transactions of learned societies, both English and foreign, of the present century (1886) available at http://archive.org/details/synopsisofelemen00carrrich.

When I try a simple cut&paste from the PDF of the book (the DjVu version still to be downloaded), say of page 27, I get:


INDEX TO TROPOSITIONS OF EUCLID REFERRED TO IX THIS WORK.
Tho references to Euclid are made in Koinan and ^Vrabic numerals ; e.g. (VI. 19).
BOOK T.
I. 4.—Triaui^'los arc equal and similar if two sides and the included an<^le of each are equal each to each.
I. 5.—The angles at the base of an isosceles triangle are equal.
1. 0.—The converse of 5.
I. 8.—Triangles are equal and similar if tlie tliroe sides of eacli arc
ecjual each to each.
I. IT). —The exterior angle of a triangle is grojiter than the interior
and opposite.
I. 20.—Twosidesofatrianglearegreaterthanthethird.
I. 26.—Triangles are equal and similar if two angles and one corres-
ponding side of each are equal each to each.
I. 27.—Two straight lines are parallel if tlicy make equal alternate
angles with a third line. I. 29.—Theconverseof27.

I wonder how much of this could be corrected automatically.

Loose ends from LUNAR

Quickly reading through the tech report

[PDF] Semantics and quantification in natural language question answering
[PDF] from stanford.edu
WA Woods… - Advances in computers, 1978 - stanford.edu
The history of communication between man and machines has followed a path of increasing
provision for the convenience and ease of communication on the part of the human. From
raw binary and octal numeric machine languages. through various symbolic assembly. ...
Cited by 179 - Related articles - View as HTML - All 7 versions

led me to the final section on Loose Ends. It seems worth a second expert look to decide whether the MOLTO tools can tackle such issues, if they are still open.

Any volunteer? I believe this is an enjoyable read.

Firefox plugin for predictive typing

I just installed the German Austrian dictionary for Firefox. It spells check what I type, e.g. as a comment in Facebook.

Along the same lines, shouldn't we able to do some predictive typing plugin too?

https://developer.mozilla.org/En/Firefox_addons_dev_guide see Chapter 5.

Bootstrapping semantic frame alignments with GF?

See inspiring talk of Dekai at EAMT2012, for doing it with ITGs and LTGs analysis with English-Chinese.

Also see

http://www.cs.ust.hk/~dekai/ssst

Asked Dana.

Ambitious checking of correctness of Wikipedia pages

As it happened now, in a private correspondence between Jordi and myself, I was pointing out the word problem mentioned in the Wikipedia page, http://en.wikipedia.org/wiki/Brahmagupta,

In chapter twelve of his Brahmasphutasiddhanta, Brahmagupta finds Pythagorean triples,

    12.39. The height of a mountain multiplied by a given multiplier 
    is the distance to a city; it is not erased. When it is divided 
    by the multiplier increased by two it is the leap of one of the 
    two who make the same journey.[9]

or in other words, for a given length m and an arbitrary multiplier x, 
let a = mx and b = m + mx/(x + 2). 
Then m, a, and b form a Pythagorean triple.[9]

and Jordi noticed that m=3, x=1, then a = 3, b = 3 + 3/3 = 4, but (1,3,4) is not a Pythagorean triple.

Symbolic transliteration from the English (we assume) is incorrect. Will it ever be possible to automatically setup such a system of equations? How was the original? Is it easier to comprehend and convert to a system of equations?

Asked Shafqat and Prasad to try to find the original.

Correcting the use of tenses in subordinate sentences

Languages like Italian have strict rules on the use of the proper conjunctive tense form in subordinate sentences. These rules are very frequently broken, maybe because of poor grammar education in primary school, maybe because language is evolving towards an easier, simplified but less elegant form, maybe because it is now being spoken by non natives. Nevertheless, seeing blatant errors published online by newspapers contributes to the degrading quality of the language. Could a generic GF parser (as the one written by Malin for Swedish) be useful for checking tenses accordance?

Below a screenshot of the example that prompted this comment.

Are there any other languages with such features? Is Swedish one such too?

Computable Format Documents made Multilingual

Would that be possible?

eScience user group, eLearning.

See more at

http://www.wolfram.com/training/special-event/wolfram-cdf-virtual-worksh...

Correct short scientific text in online applets

Over and over one happens to be reading grammatically incorrect English. It just happened now with the otherwise very nice applet http://htwins.net/scale2/scale2.swf?bordercolor=white.

Needs some playing with it to check the kind of language used in these small descriptions of popular science communication.

Translating its language file would turn this applet to such a nice multilingual learning resource.

Generate flashcards

I believe it would be easy to use GF grammars to generate flashcards, the kind used by students to review a subject area, or a language, or definitions in a subject. I can imagine also using the KRI to make more complicated flash cards just by enumerating the questions, printing them on one side, and printing the answers on the reverse side.

Mobile apps must be available that take some standard input format for flashcards.

Also, device like game consoles (I am thinking Nintendo DS and similar ones that already have language learning applications, see e.g. My Coach series).

Sample code

There is some sample code for flashcards

https://github.com/aliok/flashcards

http://bit.ly/deFlashcardsSource

It is a modular application and possibly we need only look into

flashcards-data-service : Data provider for the application. Python + Google App Engine project.

The service responds with the words(along with articles and English translation) of a specified count (currently 100) and a user key. If the same user key is given to the service for the next time, the service returns the next set. If no user key is passed, then it returns the first set.

The dictionary is a static Python code. The flashcards-word-set-generator module generates that static Python dictionary code, see https://github.com/aliok/flashcards/blob/master/flashcards-data-service/...

Turning it to an online service just requires changing the function getEntrySetForUser in https://github.com/aliok/flashcards/blob/master/flashcards-data-service/... to fetch randomly generated entries from our lexicon resources (this could include more complicated sentence patterns than what the current dictionary supports).