Departamento de Lenguajes y Sistemas        Informáticos Departament de Llenguatges i Sistemes Informàtics
Departamento de Lenguajes y Sistemas Informáticos
Universitat d'Alacant / Universidad de Alicante

home HOME

PUBLICATIONS
[ Research articles ]
[ Communications ]
[ Other ]
[ Software ]

LECTURES
[ Marcado de textos ]
[ Algoritmia avanzada ]

Software

DTDprune ( Project page on Sourceforge.net)

Description

Performs external DTD simplification according to previously tagged text, as described in Bia, Carrasco and Forcada. Parameter entities are replaced at every model group and simplified independently. The behaviour with namespaces has not been checked.

Requirements

Uses STL and libxml2. Usually compiled as c++ `xml2-config --cflags` -lxml2 -o dtdprune dtdprune.C

Usage

dtdprune [-s] file.dtd file1.xml [file2.xml ...]

Option -s is used to print DTD statistics (no simplification performed)

parser.C

Description

Statistical parser based on the extension described by Chappelier y Rajman. The grammar cannot contain empty rules (that is, with empty rhs). Unit production chains are followed upto N steps (the default N=4 can be #defined at compile time).

Requirements

Uses STL. Compiled as c++ -D N=6 -oparser parser.C.

Usage

parser grammar_file [-S initial_variable] < text

  • Text contains sentence (one per line; words separated by whitespace) such as

    Pierre Vinken , 61 years old, will join the board as a nonexecutive director Nov. 29.

  • The grammar file contains rules (one per line) such as

    1086 NP NNP NNP
    219 NP CD NNS
    11 ADJP NP JJ
    4 NP NP , ADJP ,

    Each line contains the number of times the rule is used in the training set (or its probability). First variable is the lhs of the production.

Spanish Hyphenator

Description

Hyphenates spanish words as described by J. Mañas.

Requirements

Java 1.5 (or higher).

Usage

java Hyphenator [-f] input1 input2 ....

Input is a list of words or files (option -f).


LINKS
IntermonOxfam