Software
Description
Performs external DTD simplification according to previously
tagged text, as described
in Bia,
Carrasco and Forcada.
Parameter entities are replaced at every model
group and simplified independently. The behaviour with
namespaces has not
been checked.
Requirements
Uses STL and libxml2. Usually compiled as
c++ `xml2-config --cflags` -lxml2 -o dtdprune dtdprune.C
Usage
dtdprune [-s] file.dtd file1.xml [file2.xml ...]
Option -s is used to print DTD statistics (no simplification performed)
Description
Statistical parser based on the extension described by
Chappelier y Rajman.
The grammar cannot contain empty rules (that is, with empty rhs).
Unit production chains are followed upto N steps
(the default N=4 can be #defined at compile time).
Requirements
Uses STL. Compiled as c++ -D N=6 -oparser parser.C.
Usage
parser grammar_file [-S initial_variable] < text
-
Text contains sentence (one per line; words separated by whitespace) such as
Pierre Vinken , 61 years old, will join the board as a nonexecutive director Nov. 29.
-
The grammar file contains rules (one per line) such as
1086 NP NNP NNP
219 NP CD NNS
11 ADJP NP JJ
4 NP NP , ADJP ,
Each line contains the number of times the rule is used in the
training set (or its probability). First variable is the lhs of
the production.
Description
Hyphenates spanish words as described by
J.
Mañas.
Requirements
Java 1.5 (or higher).
Usage
java Hyphenator [-f] input1 input2 ....
Input is a list of words or files (option -f).
|