Hybrid Systems & Patents WG

Minutes

Participants: Aarne, Ramona, Milen, Borislav, Cristina, Meritxell

Decisions taken

Write GF grammars to solve problems in the SMT system: compound and biological names, word reordering, gender agreement.

Biological names are different than compound names and raise different challenges. LauriC can provide a database of compounds and biological names.

Increase parser robustness by chunking the claims and parsing the chunks separately and recombine the results with the help of the grammar. Reduce ambiguities with bottom up disambiguation based on the corpus.

Simplify the query language because some of the English queries are weird in other languages.

Will need someone else to work on French grammars.

The user interface allows querying the retrieval system using the controlled language, free text and a combination of both.

Results are shown highlighting the relevant words in relation to the query. If not possible to find the words in translated documents (lexicon is needed) then highlight the whole sentence.

GF and SMT tools have been shared and installed to start hybridisation studies.

TO DO LIST

Generate synthetic corpora and alignments from GF grammars.

Provide a translation of the lexicon.

Write simple abstract syntax representation and grammar for the results. Write templates for each topic.

Comparison between equivalent tools in GF and SMT systems.