Publications
2016
Calvo-Zaragoza, J., Valero-Mas, J. J., and Rico-Juan, J. R. (2016).
Prototype Generation on Structural Data using Dissimilarity
Space Representation.
Neural Computing and Applications, ?(?):?-?, (Impact
Factor: 1.569 - Q2 - JCR).
[ bib |
DOI |
http ]
(+Abstract-)
Data Reduction techniques play a key role in instance-based classification to lower the amount of
data to be processed. Among the different existing approaches, Prototype Selection (PS) and Prototype Gen-
eration (PG) are the most representative ones. These two families differ in the way the reduced set is ob-
tained from the initial one: while the former aims at selecting the most representative elements from the set,
the latter creates new data out of it. Although PG is considered to delimit more efficiently decision bound-
aries, the operations required are not so well defined in scenarios involving structural data such as strings,
trees or graphs. This work studies the possibility of using Dissimilarity Space (DS) methods as an interme-
diate process for mapping the initial structural representation to a statistical one, thereby allowing the use
of PG methods. A comparative experiment over string data is carried out in which our proposal is faced to PS methods on the original space. Results show that the proposed strategy is able to achieve significantly similar results to PS in the initial space, thus standing as a clear alternative to the classic approach, with some additional advantages derived from the DS representation.
Valero-Mas, J. J., Calvo-Zaragoza, J., and Rico-Juan, J. R. (2016a).
An Experimental Study on Rank Methods for Prototype Selection.
Soft Computing, ?(?):?-?, (Impact Factor: 1.271 - Q2 -
JCR).
[ bib |
DOI |
http ]
(+Abstract-)
Prototype Selection is one of the most popular approaches for addressing the low efficiency issue typically found in the well-known k-Nearest Neighbour classification rule. These techniques select a representative subset from an original collection of prototypes with the premise of maintaining the same classification accuracy. Most recently, rank methods have been proposed as an alternative to develop new selection strategies. Following a certain heuristic, these methods sort the elements of the initial collection according to their relevance and then select the best possible subset by means of a parameter representing the amount of data to maintain. Due to the relative novelty of these methods, their performance and competitiveness against other strategies is still unclear. This work performs an exhaustive experimental study of such methods for prototype selection. A representative collection of both classic and sophisticated algorithms are compared to the aforementioned techniques in a number of datasets, including different levels of induced noise. Results report the remarkable competitiveness of these rank methods as well as their excellent trade-off between prototype reduction and achieved accuracy.
Valero-Mas, J. J., Calvo-Zaragoza, J., and Rico-Juan, J. R. (2016c).
Selecting promising classes from generated data for an efficient
multi-class Nearest Neighbor classification.
Soft Computing, ?(?):?-?, (Impact Factor: 1.271 - Q2 -
JCR).
[ bib |
DOI |
http ]
(+Abstract-)
The Nearest Neighbor rule is one of the most considered algorithms for supervised learning because of its simplicity and fair performance in most cases. However, this technique has a number of disadvantages, being the low computational efficiency the most prominent one. This paper presents a strategy to overcome this obstacle in multi-class classification tasks. This strategy proposes the use of Prototype Reduction algorithms that are capable of generating a new training set from the original one to try to gather the same information with fewer samples. Over this reduced set, it is estimated which classes are the closest ones to the input sample. These classes are referred to as promising classes. Eventually, classification is performed using the original training set using the Nearest Neighbor rule but restricted to the promising classes. Our experiments with several datasets and significance tests show that a similar classification accuracy can be obtained compared to using the original training set, with a significantly higher efficiency.
Valero-Mas, J. J., Calvo-Zaragoza, J., and Rico-Juan, J. R. (2016b).
On the suitability of Prototype Selection methods for kNN
classification with distributed data.
Neurocomputing, 203:150-160, (Impact Factor: 2.392 - Q1 -
JCR).
[ bib |
DOI |
http ]
(+Abstract-)
In the current Information Age, data production and processing demands are ever increasing. This has motivated the appearance of large-scale distributed information. This phenomenon also applies to Pattern Recognition so that classic and common algorithms, such as the k-Nearest Neighbour, are unable to be used. To improve the efficiency of this classifier, Prototype Selection (PS) strategies can be used. Nevertheless, current PS algorithms were not designed to deal with distributed data, and their performance is therefore unknown under these conditions. This work is devoted to carrying out an experimental study on a simulated framework in which PS strategies can be compared under classical conditions as well as those expected in distributed scenarios. Our results report a general behaviour that is degraded as conditions approach to more realistic scenarios. However, our experiments also show that some methods are able to achieve a fairly similar performance to that of the non-distributed scenario. Thus, although there is a clear need for developing specific PS methodologies and algorithms for tackling these situations, those that reported a higher robustness against such conditions may be good candidates from which to start.
2015
Rico-Juan, J. R. and Calvo-Zaragoza, J. (2015).
Improving classification using a Confidence Matrix based on weak
classifiers applied to OCR.
Neurocomputing, 151:1354-1361, (Impact Factor: 2.392 - Q1
- JCR).
[ bib |
http ]
(+Abstract-)
This paper proposes a new feature representation method based on the construction of a Confidence Matrix (CM). This representation consists of posterior probability values provided by several weak classifiers, each one trained and used in different sets of features from the original sample. The CM allows the final classifier to abstract itself from discovering underlying groups of features. In this work the CM is applied to isolated character image recognition, for which several set of features can be extracted from each sample.
Experimentation has shown that the use of the CM permits a significant improvement in accuracy in most cases, while the others remain the same. The results were obtained after experimenting with four well-known corpora, using evolved meta-classifiers with the k-Nearest Neighbor rule as weak classifier and by applying statistical significance tests.
Calvo-Zaragoza, J., Valero-Mas, J. J., and Rico-Juan, J. R. (2015).
Improving kNN multi-label classification in Prototype Selection
scenarios using class proposals.
Pattern Recognition, 48(5):1608-1622, (Impact Factor:
3.096 - Q1 - JCR).
[ bib |
DOI |
http ]
(+Abstract-)
Prototype Selection (PS) algorithms allow a faster Nearest Neighbor classification by keeping only the most profitable prototypes of the training set. In turn, these schemes typically lowers the performance accuracy. In this work a new strategy for multi-label classifications tasks is proposed to solve this accuracy drop without the need of using all the training set. For that, given a new instance, the PS algorithm is used as a fast recommender system which retrieves the most likely classes. Then, the actual classification is performed only considering the prototypes from the initial training set belonging to the suggested classes. Results show this strategy provides a large set of trade-off solutions which fills the gap between PS-based classification efficiency and conventional kNN accuracy. Furthermore, this scheme is not only able to, at best, reach the performance of conventional kNN with barely a third of distances computed, but it does also outperform the latter in noisy scenarios, proving to be a much more robust approach.
2014
Rico-Juan, J. R. and Iñesta, J. M. (2014).
Adaptive training set reduction for nearest neighbor
classification.
Neurocomputing, 38(1):316-324, (Impact Factor: 2.083 - Q2
- JCR).
[ bib |
http ]
(+Abstract-)
The research community related to the human-interaction framework is becoming increasingly more interested in interactive pattern recognition, taking direct advantage of the feedback information provided by the user in each interaction step in order to improve raw performance. The application of this scheme re[[uires learning techniques that are able to adaptively re-train the system and tune it to user behavior and the specific task considered. Traditional static editing methods filter the training set by applying certain rules in order to eliminate outliers or maintain those prototypes that can be beneficial in classification. This paper presents two new adaptive rank methods for selecting the best prototypes from a training set in order to establish its size according to an external parameter that controls the adaptation process, while maintaining the classification accuracy. These methods estimate the probability of each prototype of correctly classifying a new sample. This probability is used to sort the training set by relevance in classification. The results show that the proposed methods are able to maintain the error rate while reducing the size of the training set, thus allowing new examples to be learned with a few extra computations.
Abreu, J. and Rico-Juan, J. R. (2014).
A New Iterative Algorithm for Computing a Quality Approximated
Median of Strings based on Edit Operations.
Pattern Recognition Letters, 36:74-80, (Impact Factor:
1.551 - Q2 - JCR).
[ bib ]
(+Abstract-)
This paper presents a new algorithm that can be used to compute an approximation to the median of a set of strings. The approximate median is obtained through the successive improvements of a partial solution. The edit distance from the partial solution to all the strings in the set is computed in each iteration, thus accounting for the frequency of each of the edit operations in all the positions of the approximate median. A goodness index for edit operations is later computed by multiplying their frequency by the cost. Each operation is tested, starting from that with the highest index, in order to verify whether applying it to the partial solution leads to an improvement. If successful, a new iteration begins from the new approximate median. The algorithm finishes when all the operations have been examined without a better solution being found. Comparative experiments involving Freeman chain codes encoding 2D shapes and the Copenhagen chromosome database show that the quality of the approximate median string is similar to benchmark approaches but achieves a much faster convergence.
2013
Abreu, J. and Rico-Juan, J. R. (2013).
An improved fast edit approach for two-string approximated mean
computation applied to OCR.
Pattern Recognition Letters, 34(5):496-504, (Impact
Factor: 1.226 - Q2 - JCR).
[ bib |
http ]
(+Abstract-)
This paper presents a new fast algorithm for computing an approximation to the mean of two strings of characters representing a 2D shape and its application to a new Wilson-based editing procedure. The approximate mean is built up by including some symbols from the two original strings. In addition, a greedy approach to this algorithm is studied, which allows us to reduce the time required to compute an approximate mean. The new dataset editing scheme relaxes the criterion for deleting instances proposed by the Wilson editing procedure. In practice, not all instances misclassified by their near neighbors are pruned. Instead, an artificial instance is added to the dataset in the hope of successfully classifying the instance in the future. The new artificial instance is the approximated mean of the misclassified sample and its same-class nearest neighbor. Experiments carried out over three widely known databases of contours show that the proposed algorithm performs very well when computing the mean of two strings, and outperforms methods proposed by other authors. In particular, the low computational time required by the heuristic approach makes it very suitable when dealing with long length strings. Results also show that the proposed preprocessing scheme can reduce the classification error in about 83% of trials. There is empirical evidence that using the greedy approximation to compute the approximated mean does not affect the performance of the editing procedure.
2012
Rico-Juan, J. R. and Iñesta, J. M. (2012a).
Confidence voting method ensemble applied to off-line signature
verification.
Pattern Analysis and Applications, 15(2):113-120, (Impact
Factor: 0.814 - Q3 - JCR).
[ bib ]
(+Abstract-)
In this paper, a new approximation to off-line signature verification is proposed based on two-class classifiers using an expert decisions ensemble. Different methods to extract sets of local and a global features from the target sample are detailed. Also a normalisation by confidence voting method is used in order to decrease the final equal error rate (EER). Each set of features is processed by a single expert, and on the other approach proposed, the decisions of the individual classifiers are combined using weighted votes. Experimental results are given using a subcorpus of the large MCYT signature database for random and skilled forgeries. The results show that the weighted combination outperforms the individual classifiers significantly. The best EER obtained were 6.3% in the case of skilled forgeries and 2.3% in the case of random forgeries.
Rico-Juan, J. R. and Iñesta, J. M. (2012b).
New rank methods for reducing the size of the training set
using the nearest neighbor rule.
Pattern Recognition Letters, 33(5):654-660, (Impact
Factor: 1.226 - Q2 - JCR).
[ bib |
http ]
(+Abstract-)
Some new rank methods to select the best prototypes from a training set are proposed in this paper in order to establish its size according to an external parameter, while maintaining the classification accuracy. The traditional methods that filter the training set in a classification task like editing or condensing have some rules that apply to the set in order to remove outliers or keep some prototypes that help in the classification. In our approach, new voting methods are proposed to compute the prototype probability and help to classify correctly a new sample. This probability is the key to sorting the training set out, so a relevance factor from 0 to 1 is used to select the best candidates for each class whose accumulated probabilities are less than that parameter. This approach makes it possible to select the number of prototypes necessary to maintain or even increase the classification accuracy. The results obtained in different high dimensional databases show that these methods maintain the final error rate while reducing the size of the training set.
2011
Abreu, J. and Rico-Juan, J. R. (2011).
Characterization of contour regularities based on the
Levenshtein edit distance.
Pattern Recognition Letters, 32:1421-1427, (Impact
Factor: 1.034 - Q3 - JCR).
[ bib ]
(+Abstract-)
This paper describes a new method for quantifying the regularity of contours and comparing them (when encoded by Freeman chain codes) in terms of a similarity criterion which relies on information gathered from Levenshtein edit distance computation. The criterion used allows subsequences to be found from the minimal cost edit sequence that specifies an alignment of contour segments which are similar. Two external parameters adjust the similarity criterion. The information about each similar part is encoded by strings that represent an average contour region. An explanation of how to construct a prototype based on the identified regularities is also reviewed. The reliability of the prototypes is evaluated by replacing contour groups (samples) by new prototypes used as the training set in a classification task. This way, the size of the data set can be reduced without sensibly affecting its representational power for classification purposes. Experimental results show that this scheme achieves a reduction in the size of the training data set of about 80% while the classification error only increases by 0.45% in one of the three data sets studied.
2010
Rico-Juan, J. R. and Abreu, J. I. (2010).
A new editing scheme based on a fast two-string median
computation applied to OCR.
In Hancok, E. R., Wilson, R. C., Ilkay, T. W., and Escolano, F.,
editors, Structural, Syntactic, and Statistical Pattern Recognition,
number 6218 in Lecture Notes in Computer Science, pages 748-756. Springer.
[ bib ]
(+Abstract-)
This paper presents a new fast algorithm to compute an approximation to the median between two strings of characters representing a 2D shape and its application to a new classification scheme to decrease its error rate. The median string results from the application of certain edit operations from the minimum cost edit sequence to one of the original strings. The new dataset editing scheme relaxes the criterion to delete instances proposed by the Wilson Editing Proce- dure. In practice, not all instances misclassified by its near neighbors are pruned. Instead, an artificial instance is added to the dataset expecting to successfully classify the instance on the future. The new artificial instance is the median from the misclassified sample and its same-class nearest neighbor. The experiments over two widely used datasets of handwritten characters show this preprocessing scheme can reduce the classification error in about 78% of trials.
2009
Rico-Juan, J. R. (2009).
Creating synchronised presentations for mobile devices using
open source tools.
In Proceedings of the 2009 International Conference on
e-Learning, e-Business, Enterprise Information Systems and e-Government,
pages 50-52, Las Vegas, Nevada, USA. CSREA Press.
[ bib |
.pdf ]
(+Abstract-)
In this paper, we describe a way to create synchronized presentations for mobile devices using only open-source tools. In the framework of higher education, it is important to provide the students with flexible and interactive resources when the time assigned to laboratory or lectures gets decreased. Nowadays, the students have often one o many mobile devices such as mobile phones, smartphones, PDAs (Personal Digital Assistant), etc. This gives teachers the opportunity to create resources for these kind of devices. On the other hand, the open-source software offers an interesting alternative in order to create educational resources, just using a single tool or a combination of them. The main idea here is to describe a procedure to create presentations combining PDF files as slides, audio files with detailed explanations and flash video files (.swf) as showing demos. We describe in detail how to integrate these individual components to create a high quality presentation,, based on vectorial components, with small size of result files. It also allows to play these presentations in a mobile devices. In contrast to commercial tools, our approach does not use special interfaces or formats and it allows one to export presentations to formats compatible with other tools in a future tools. Our proposal also allows one to work with conventional tools to create slides (such as PowerPoint, OpenOffice.org Impress or LaTeX) due to the final slides are exported to PDF and also to use standard audio tools to create audio (WAV, OGG and MP3 are supported). Video can be included just by converting the original file to SWF (flash video) format. In order to make use of the educational resources, we just need a mobile device with a web browser and a flash plug-in installed and, therefore, the result can be easily distributed through a web server or as a package that can be stored locally in the device.
Abreu, J. I. and Rico-Juan, J. R. (2009).
Contour regularity extraction based on string edit distance.
In Pattern Recognition and Image Analysis. IbPRIA 2009,
Lecture Notes in Computer Science, pages 160-167, Póvoa de Varzim,
Portugal. Springer.
[ bib |
.pdf ]
(+Abstract-)
In this paper, we present a new method for constructing prototypes representing a set of contours encoded by Freeman Chain Codes.Our method build new prototypes taking into account similar segments shared between contours instances. The similarity criterion was based on the Levenshtein Edit Distance definition. We also outline how to apply our method to reduce a data set without sensibly affect its representational power for classification purposes. Experimental results shows that our scheme can achieve compressions about 50% while classification error increases only by 0.75%.
2007
Rico-Juan, J. R. and Iñesta, J. M. (2007).
Normalisation of Confidence Voting Methods Applied to a Fast
Handwritten OCR Classification.
In Kurzynski, M., Puchala, E., Wozniak, M., and Zolnierek, A.,
editors, Computer Recognition Systems 2, number 45 in Advances in
Soft Computing, pages 405-412, Wroclaw, Poland. Springer.
[ bib |
.pdf ]
(+Abstract-)
In this work, a normalisation of the weights utilized for combining classifiers decisions based on similarity Euclidean distance is presented. This normalisation is used by the confidence voting methods to decrease the final error rate in an OCR task. Difierent features from the characters are extracted. Each set of features is processed by a single classifier and then the decisions of the individual classifiers are combined using weighted votes, using different techniques. The error rates obtained are as good or slightly better than those obtained using a Freeman chain codes as contour representation and the string edit distance as similarity measure, but the complexity and classication time decrease dramatically.
Rico-Juan, J. R. and Carrasco, R. C. (2007a).
How to create an efficient audiovisual slide presenter.
In IADAT-e2007. 4th. IADAT Interntional Conference on
Education, volume 1, pages 40-43, Palma de Mallorca, (Spain).
International Association for the Development of Advances in Technology
(IADAT).
[ bib ]
(+Abstract-)
In this paper, we describe a flexible tool to create synchronized presentations using only open-source tools. In the framework of the new European Credit Transfer System (ECTS), it is even more important to provide the students with flexible and interactive resources as the time assigned to laboratory or lectures gets decreased. The open-source software offers an interesting alternative in order to create educational resources, some times using a single tool but often using a combination of them. Here, we describe a procedure to create AV presentations combining PDF files (slides), audio files, flash video files (.swf) and flash video streaming (.flv) . We describe in detail how integrate these individual components to automatically create a high quality presentation, that is, based on vectorial components, with small size of result files. It also allows to integrate video or video streaming into single slides. In contrast to commercial tools, this tool does not use special interfaces or formats and it allows one to export presentations to formats compatible with other (future) presentation tools. Our tool also allows one to work with traditional tools to create slides (such as PowerPoint, OpenOffice Impress or LaTeX) provided that the final slides are exported to PDF and also to use standard audio tools to create audio (WAV, OGG and MP3 are supported). Video can be included just by converting the original file to SWF (flash video) format or FLV (flash video streaming). In order to make use of the educational resource, we need just a web browser with flash plug-in installed and, therefore, the result can be easily distributed through a web server, a CD or a DVD.
Rico-Juan, J. R. and Carrasco, R. C. (2007b).
How to do easy video presentaions using open source tools.
In International Technology, Education and Development
Conference (INTED), volume 1, pages 30-31, Valencia, (Spain).
International Association of Technology, Education and Development (IATED).
[ bib |
http ]
(+Abstract-)
In this paper, we describe a flexible approximation to create video presentations using open source tools. In the new ECTS framework the time that the student spends in a laboratory or in a classroom is reduced and, therefore it is important to assist to students with materials more flexible and interactive than classical electronic papers or books. The open source programs are a good alternative to create educational resources. Often, it not possible to do the whole video presentation with a single tool, but it is possible choose different tools to do that. We implements a method to create video presentations from PDF file (slides), audio files and flash (.swf) video files. We describe in detail how integrate this individual components to create automatically a high quality video presentation with small output files. With commercial tools, we may only use the special interfaces and formats supported by the tool. As a consequence, we cannot export all presentation to files compatible with other presentations tool. So, if we cannot export our previous presentations it is difficult to change to a new better tool. The method described here solves these problems. It allows us to work with traditional tools to create slides (PowerPoint, Open Office Impress or LaTeX), provided that we export to PDF the final slides. We can use some audio tool to create audio for each slide in different formats (WAV, OGG and MP3 are supported). If we want to include a video we need to convert it to SWF (Sockwave FlashTM) format. The result requires only a web browser with a Flash plug-in. So, we can distribute the result in standard media such as a web server, a CD or a DVD.
2006
Rico-Juan, J. R. and Iñesta, J. M. (2006a).
Edit Distance for Ordered Vector Sets: A Case of Study.
In Yeung, D., Kwok, J. T., Fred, A., Roli, F., and de Ridder, D.,
editors, Structural, Syntactic, and Statistical Pattern Recognition,
number 4109 in Lecture Notes in Computer Science, pages 200-207, Hong
Kong, China. Springer.
[ bib |
.pdf ]
(+Abstract-)
Digital contours in a binary image can be described as an ordered vector set. In this paper an extension of the string edit distance is defined for its computation between a pair of ordered sets of vectors. This way, the differences between shapes can be computed in terms of editing costs. In order to achieve efficency a dominant point detection algorithm should be applied, removing redundant data before coding shapes into vectors. This edit distance can be used in nearest neighbour classification tasks. The advantages of this method applied to isolated handwritten character classification are shown, compared to similar methods based on string or tree representations of the binary image.
Rico-Juan, J. R. and Iñesta, J. M. (2006b).
An edit distance for ordered vector sets with application to
character recognition, volume 1, chapter 4, pages 54-62.
Computer Vision Center.
[ bib |
.ps ]
(+Abstract-)
In this paper a new algorithm to describe a binary image as an ordered vector set is presented. An extension of the string edit distance is defined for computing it between a pair of ordered sets of vectors. This edit distance can be used in nearest neighbor classification tasks. The advantages of this method applied to isolated handwritten character classification are shown, compared to similar methods based in string or tree representations of the binary image.
2005
Rico-Juan, J. R., Calera-Rubio, J., and Carrasco, R. C. (2005).
Smoothing and Compression with Stochastic k-testable Tree
Languages.
Pattern Recognition, 38(9):1420-1430, (Impact Factor:
2.607 - Q1 - JCR).
[ bib ]
(+Abstract-)
In this paper, we describe a generalization for tree stochastic languages of k-gram models. These models are based on the k-testable class, a subclass of the languages recognizable by ascending tree auntomata. One of the advantages of this approchis that the probabilistic model can be updated in an incremental fashion. Another feature is that backing-off schemes can be defined. As an illustration of their applicability, they have been used to compress tree data files at a better rate than string-based methods.
2004
Rico-Juan, J. R. and Micó, L. (2004).
Finding significant points for a handwritten classification
task.
In Campilho, A. and Kamel, M., editors, International
Conference on Image Analysis and Recognition, number 3211 in Lecture Notes
in Computer Science, pages 440-446, Porto, Portugal. Springer.
[ bib |
.pdf ]
(+Abstract-)
When objects are represented by curves in a plane, highly useful information is conveyed by significant points. In this paper, we compare the use of different mobile windows to extract dominant points of handwritten characters. The error rate and classification time using an edit distance based nearest neighbour search algorithm are compared for two different cases: string and tree representation.
2003
Rico-Juan, J. R. and Micó, L. (2003b).
Some Results about the Use of Tree/String Edit Distances in a
Nearest Neighbour Classification Task.
In Goos, G., Hartmanis, J., and van Leeuwen, J., editors,
Pattern Recognition and Image Analysis, number 2652 in Lecture Notes in
Computer Science, pages 821-828, Puerto Andratx, Mallorca, Spain. Springer.
[ bib |
.pdf ]
(+Abstract-)
In pattern recognition there is a variety of applications where the patterns are classified using edit distance. In this paper we present some results comparing the use of tree and string edit distances in a handwritten character recognition task. Some experiments with different number of classes and of classifiers are done.
Rico-Juan, J. R. and Micó, L. (2003a).
Comparison of AESA and LAESA search algorithms using string
and tree edit distances.
Pattern Recognition Letters, 24(9):1427-1436, (Impact
Factor: 0.809 - Q3 - JCR).
[ bib |
.pdf ]
(+Abstract-)
Although the success rate of handwritten character recognition using a nearest neighbour technique together with edit distance is satisfactory, the exhaustive search is expensive. Some fast methods as AESA and LAESA have been proposed to find nearest neighbours in metric spaces. The average number of distances computed by these algorithms is very low and does not depend on the number of prototypes in the training set. In this paper, we compare the behaviour of these algorithms when string and tree edit distances are used.
Carrasco, R. C. and Rico-Juan, J. R. (2003).
A similarity between probabilistic tree languages: application
to XML document families.
Pattern Recognition, 36(9), (Impact Factor: 1.611 - Q1 -
JCR).
[ bib ]
(+Abstract-)
We describe a general approach to compute a similarity measure between distributions generated by probabilistic tree automata that may be used in a number of applications in the pattern recognition field. In particular, we show how this similarity can be computed for families of structured (XML) documents can be computed. In such case, the use of regular expressions to specify the right part of the expansion rules adds some complexity to the task.
2002
Rico-Juan, J. R., Calera-Rubio, J., and Carrasco, R. C. (2002).
Stochastic k-testable Tree Languages and Applications.
In Adriaans, P., Fernau, H., and van Zaanen, M., editors,
Grammatical Inference: Algorithms and Applications. ICGI 2002, number 2484
in Lecture Notes in Artificial Intelligence, pages 199-212, Amsterdam
(Nederland). Springer-Verlag.
[ bib ]
(+Abstract-)
In this paper, we present a natural generalization of k-gram models for tree stochastic languages based on the k-testable class. In this class of models, frequencies are estimated for a probabilistic regular tree grammar wich is bottom-up deterministic. One of the advantages of this approach is that the model can be updated in an incremental fashion. This method is an alternative to costly learning algorithms (as inside-outside-based methods) or algorithms that require larger samples (as many state merging/splitting methods)
Rico-Juan, J. R. and Calera-Rubio, J. (2002).
Evaluation of handwritten character recognizers using
tree-edit-distance and fast nearest neighbour search.
In Iñesta, J. M. and Micó, L., editors, Pattern
Recognition in Information Systems, pages 326-335, Alicante (Spain). ICEIS
PRESS.
[ bib |
.pdf ]
(+Abstract-)
Although the rate of well classified prototypes using tree-edit-distance is satisfactory, the exhaustive classification is expensive. Some fast methods as AESA and LAESA have been proposed to find nearest neighbours in metric spaces. The average number of distances computed by these algorithms does not depend on the number of prototypes. In this paper we apply these classifiers algorithms to the task of handwritten character recognition and obtain a low average error rate (2%) and a fast classification.
2001
Rico-Juan, J. R. (2001).
Inferencia estocástica y aplicaciones de los lenguajes de
árboles.
PhD thesis, Universidad de Alicante, Departamento de Lenguajes y
Sistemas Informáticos.
[ bib |
.pdf ]
(+Abstract-)
Una de las aportaciones originales de esta tesis es la definición de un modelo de inferencia estocástica para lenguajes k-testables de árboles y su aplicación a la compresión y clasificación. También se aportan otros modelos probabilísticos para las tareas de compresión de superficies 3D y reconocimiento de palabras manuscritas fuera de línea
2000
Rico-Juan, J. R., Calera-Rubio, J., and Carrasco, R. C. (2000b).
Probabilistic k-testable tree-language.
In Oliveira, A. L., editor, Proceedings of 5th International
Colloquium, volume 1891 of Lecture Notes in Computer Science, pages
221-228, Lisboa (Portugal). Springer-Verlag.
[ bib |
.ps.gz |
.pdf ]
(+Abstract-)
In this paper, we present a natural generalization of k-gram models for tree stochastic languages based on the k-testable class. In this class of models, frequencies are estimated for a probabilistic regular tree grammar wich is bottom-up deterministic. One of the advantages of this approach is that the model can be updated in an incremental fashion. This method is an alternative to costly learning algorithms (as inside-outside-based methods) or algorithms that require larger samples (as many state merging/splitting methods)
Rico-Juan, J. R., Calera-Rubio, J., and Carrasco, R. C. (2000a).
Lossless compression of surfaces described as points.
In Ferri, F. J., Iñesta, J. M., Amin, A., and Pudil, P., editors,
Advances in Pattern Recognition, volume 1876 of Lecture Notes
in Computer Science, pages 457-461, Berlin. Springer-Verlag.
[ bib ]
(+Abstract-)
In many applications, objects are represented by a collection if unorganized points that scan the surface of the object. In such cases, an efficent way of storin this information is of interest. In this paper we present an arithmetic compression scheme that uses a tree representation of the data set and allows for better compression rates than general-purpose methods.
1999
Rico-Juan, J. R. (1999b).
Off-line cursive handwritten word recognition based on tree
extraction and an optimized classification distance.
In Torres, M. I. and Sanfeliu, A., editors, Pattern Recognition
and Image Analysis: Proceedings of the VII Symposium Nacional de
Reconocimiento de Formas y Análisis de Imágenes, volume 3, pages
15-16, Bilbao (Spain).
[ bib |
.ps.gz |
.pdf ]
(+Abstract-)
This paper describes a geometric approach to the dificult off-line cursive handwritten word recognition problem. The method extracts and classifies feature trees from isolated handwitten words, mesasuring the distance between two trees.
Rico-Juan, J. R. (1999a).
Esquemas Algorítmicos.
Publicaciones de la Universidad de Alicante.
[ bib ]
(+Abstract-)
Este libro describe dos esquemas de programación: programación dináica y ramificación y poda. La descripción se hace desde un punto de vista general extrayendo característcas representativas y generando un esquema que luego se aplicará a casos concretos. Contiene gran variadad de ejemplos y ejerccios resueltos