Resources

August 24, 2016

ruLearn: toolkit for the automatic inference of shallow-transfer rules for MT

Free/open-source toolkit for the automatic inference of rules for shallow-transfer MT from scarce parallel corpora and morphological dictionaries. ruLearn allows to build machine translation systems for under-resourced language pairs because it avoids the need for human experts to handcraft transfer rules and requires only a few hundred parallel sentences. Ther rules inferred can be used for rule-based MT as well as together with a hybridisation strategy for integrating linguistic resources into phrase-based statistical machine translation (see Rule2Phrase).
Download : Read paper

Rule2Phrase: toolkit for integrating shallow-transfer rules into phrase-based SMT

Free/open-source toolkit to enrich a phrase-based SMT system (Moses) with phrase pairs generated from the linguistic resources of a shallow-transfer rule-based MT system (Apertium). A system built with this toolkit was not outperformed by any other participant in the shared translation task of the Sixth Workshop on Statistical Machine Translation (WMT 11) for the Spanish–English language pair.
Download : Read paper

Gamblr-CAT: word-level quality estimation in TM-based CAT

Free/open-source software to obtain binary quality estimations at the level of words (also called word-keeping recommendations) for translation suggestions produced by a translation memory tool by using either statistical word alignments or external sources of bilingual information.
Download : Read paper

Gamblr-MT: word-level quality estimation in MT

Collection of free/open-source scripts to obtain a collection of features for word-level MT quality estimation using external sources of bilingual information.
Download : Read paper

DocTrans: document translation retrieval based on SMT techniques

Free/open-source piece of software implementing a method based on SMT techniques to retrieve documents which are a plausible translation of a given source text. The method provides the terms to use in a query to retrieve the document translation of the source document provided as input. In combination with a text search engine like Apache Lucene it can be used for translation document alignment. It relies on the free-/open-source SMT system Moses and was last tested with revision 2281.
Download : Read paper

Apertium-tagger-training-tools: target-language-driven POS tagger trainer

Free/open-source package for the unsupervised training of hidden-Markov-model-based POS taggers involved in MT. It uses information, not only from the source language, but also from the target language; to this end the Apertium MT platform is used. After training a file containing the hidden-Markov-model parameters is produced; this file can be directly used within the Apertium MT platform.
Download : Read paper

Apertium-morph: using morphological information with Apache Lucene

Free/open-source package providing a set of tools and Java classes that allow the Apache Lucene text search engine to use morphological information to index and search. To that end, the linguistic resources developed for the Apertium MT platform are used to extract morphological information while indexing.
Download : Read paper