Tel: 965 90 34 00 or 965 90 37 72 (ext. 2738)
Academic data
Ph. D. Thesis
Supervised Theses
Research interest
- Ensemble Classifiers, Edit Distances, Dissimilarity Meassures, Prototypes Selection for Classification, 2D contour regularities
Publications
Abreu, J. and Rico-Juan, J. R. (2012).
An improved fast edit approach for two-string approximated mean
computation applied to OCR.
Pattern Recognition Letters, 34(5):496-504.
[ bib |
http ]
(+Abstract-)
This paper presents a new fast algorithm for computing an approximation to the mean of two strings of characters representing a 2D shape and its application to a new Wilson-based editing procedure. The approximate mean is built up by including some symbols from the two original strings. In addition, a greedy approach to this algorithm is studied, which allows us to reduce the time required to compute an approximate mean. The new dataset editing scheme relaxes the criterion for deleting instances proposed by the Wilson editing procedure. In practice, not all instances misclassified by their near neighbors are pruned. Instead, an artificial instance is added to the dataset in the hope of successfully classifying the instance in the future. The new artificial instance is the approximated mean of the misclassified sample and its same-class nearest neighbor. Experiments carried out over three widely known databases of contours show that the proposed algorithm performs very well when computing the mean of two strings, and outperforms methods proposed by other authors. In particular, the low computational time required by the heuristic approach makes it very suitable when dealing with long length strings. Results also show that the proposed preprocessing scheme can reduce the classification error in about 83% of trials. There is empirical evidence that using the greedy approximation to compute the approximated mean does not affect the performance of the editing procedure.
Rico-Juan, J. R. and Iñesta, J. M. (2012a).
Confidence voting method ensemble applied to off-line signature
verification.
Pattern Analysis and Applications, 15(2):113-120.
[ bib ]
(+Abstract-)
In this paper, a new approximation to off-line signature verification is proposed based on two-class classifiers using an expert decisions ensemble. Different methods to extract sets of local and a global features from the target sample are detailed. Also a normalisation by confidence voting method is used in order to decrease the final equal error rate (EER). Each set of features is processed by a single expert, and on the other approach proposed, the decisions of the individual classifiers are combined using weighted votes. Experimental results are given using a subcorpus of the large MCYT signature database for random and skilled forgeries. The results show that the weighted combination outperforms the individual classifiers significantly. The best EER obtained were 6.3% in the case of skilled forgeries and 2.3% in the case of random forgeries.
Rico-Juan, J. R. and Iñesta, J. M. (2012b).
New rank methods for reducing the size of the training set using the
nearest neighbor rule.
Pattern Recognition Letters, 33(5):654-660.
[ bib |
http ]
(+Abstract-)
Some new rank methods to select the best prototypes from a training set are proposed in this paper in order to establish its size according to an external parameter, while maintaining the classification accuracy. The traditional methods that filter the training set in a classification task like editing or condensing have some rules that apply to the set in order to remove outliers or keep some prototypes that help in the classification. In our approach, new voting methods are proposed to compute the prototype probability and help to classify correctly a new sample. This probability is the key to sorting the training set out, so a relevance factor from 0 to 1 is used to select the best candidates for each class whose accumulated probabilities are less than that parameter. This approach makes it possible to select the number of prototypes necessary to maintain or even increase the classification accuracy. The results obtained in different high dimensional databases show that these methods maintain the final error rate while reducing the size of the training set.
Abreu, J. and Rico-Juan, J. R. (2011).
Characterization of contour regularities based on the Levenshtein
edit distance.
Pattern Recognition Letters, 32:1421-1427.
[ bib ]
(+Abstract-)
This paper describes a new method for quantifying the regularity of contours and comparing them (when encoded by Freeman chain codes) in terms of a similarity criterion which relies on information gathered from Levenshtein edit distance computation. The criterion used allows subsequences to be found from the minimal cost edit sequence that specifies an alignment of contour segments which are similar. Two external parameters adjust the similarity criterion. The information about each similar part is encoded by strings that represent an average contour region. An explanation of how to construct a prototype based on the identified regularities is also reviewed. The reliability of the prototypes is evaluated by replacing contour groups (samples) by new prototypes used as the training set in a classification task. This way, the size of the data set can be reduced without sensibly affecting its representational power for classification purposes. Experimental results show that this scheme achieves a reduction in the size of the training data set of about 80% while the classification error only increases by 0.45% in one of the three data sets studied.
Rico-Juan, J. R. and Abreu, J. I. (2010).
A new editing scheme based on a fast two-string median computation
applied to OCR.
In Hancok, E. R., Wilson, R. C., Ilkay, T. W., and Escolano, F.,
editors, Structural, Syntactic, and Statistical Pattern Recognition,
number 6218 in Lecture Notes in Computer Science, pages 748-756. Springer.
[ bib ]
(+Abstract-)
This paper presents a new fast algorithm to compute an approximation to the median between two strings of characters representing a 2D shape and its application to a new classification scheme to decrease its error rate. The median string results from the application of certain edit operations from the minimum cost edit sequence to one of the original strings. The new dataset editing scheme relaxes the criterion to delete instances proposed by the Wilson Editing Proce- dure. In practice, not all instances misclassified by its near neighbors are pruned. Instead, an artificial instance is added to the dataset expecting to successfully classify the instance on the future. The new artificial instance is the median from the misclassified sample and its same-class nearest neighbor. The experiments over two widely used datasets of handwritten characters show this preprocessing scheme can reduce the classification error in about 78% of trials.
Rico-Juan, J. R. (2009).
Creating synchronised presentations for mobile devices using open
source tools.
In Proceedings of the 2009 International Conference on
e-Learning, e-Business, Enterprise Information Systems and e-Government,
pages 50-52, Las Vegas, Nevada, USA. CSREA Press.
[ bib |
.pdf ]
(+Abstract-)
In this paper, we describe a way to create synchronized presentations for mobile devices using only open-source tools. In the framework of higher education, it is important to provide the students with flexible and interactive resources when the time assigned to laboratory or lectures gets decreased. Nowadays, the students have often one o many mobile devices such as mobile phones, smartphones, PDAs (Personal Digital Assistant), etc. This gives teachers the opportunity to create resources for these kind of devices. On the other hand, the open-source software offers an interesting alternative in order to create educational resources, just using a single tool or a combination of them. The main idea here is to describe a procedure to create presentations combining PDF files as slides, audio files with detailed explanations and flash video files (.swf) as showing demos. We describe in detail how to integrate these individual components to create a high quality presentation,, based on vectorial components, with small size of result files. It also allows to play these presentations in a mobile devices. In contrast to commercial tools, our approach does not use special interfaces or formats and it allows one to export presentations to formats compatible with other tools in a future tools. Our proposal also allows one to work with conventional tools to create slides (such as PowerPoint, OpenOffice.org Impress or LaTeX) due to the final slides are exported to PDF and also to use standard audio tools to create audio (WAV, OGG and MP3 are supported). Video can be included just by converting the original file to SWF (flash video) format. In order to make use of the educational resources, we just need a mobile device with a web browser and a flash plug-in installed and, therefore, the result can be easily distributed through a web server or as a package that can be stored locally in the device.
Abreu, J. I. and Rico-Juan, J. R. (2009).
Contour regularity extraction based on string edit distance.
In Pattern Recognition and Image Analysis. IbPRIA 2009,
Lecture Notes in Computer Science, pages 160-167, Póvoa de Varzim,
Portugal. Springer.
[ bib |
.pdf ]
(+Abstract-)
In this paper, we present a new method for constructing prototypes representing a set of contours encoded by Freeman Chain Codes.Our method build new prototypes taking into account similar segments shared between contours instances. The similarity criterion was based on the Levenshtein Edit Distance definition. We also outline how to apply our method to reduce a data set without sensibly affect its representational power for classification purposes. Experimental results shows that our scheme can achieve compressions about 50% while classification error increases only by 0.75%.
Rico-Juan, J. R. and Iñesta, J. M. (2007).
Normalisation of Confidence Voting Methods Applied to a Fast
Handwritten OCR Classification.
In Kurzynski, M., Puchala, E., Wozniak, M., and Zolnierek, A.,
editors, Computer Recognition Systems 2, number 45 in Advances in
Soft Computing, pages 405-412, Wroclaw, Poland. Springer.
[ bib |
.pdf ]
(+Abstract-)
In this work, a normalisation of the weights utilized for combining classifiers decisions based on similarity Euclidean distance is presented. This normalisation is used by the confidence voting methods to decrease the final error rate in an OCR task. Difierent features from the characters are extracted. Each set of features is processed by a single classifier and then the decisions of the individual classifiers are combined using weighted votes, using different techniques. The error rates obtained are as good or slightly better than those obtained using a Freeman chain codes as contour representation and the string edit distance as similarity measure, but the complexity and classication time decrease dramatically.
Rico-Juan, J. R. and Carrasco, R. C. (2007a).
How to create an efficient audiovisual slide presenter.
In IADAT-e2007. 4th. IADAT Interntional Conference on
Education, volume 1, pages 40-43, Palma de Mallorca, (Spain).
International Association for the Development of Advances in Technology
(IADAT).
[ bib ]
(+Abstract-)
In this paper, we describe a flexible tool to create synchronized presentations using only open-source tools. In the framework of the new European Credit Transfer System (ECTS), it is even more important to provide the students with flexible and interactive resources as the time assigned to laboratory or lectures gets decreased. The open-source software offers an interesting alternative in order to create educational resources, some times using a single tool but often using a combination of them. Here, we describe a procedure to create AV presentations combining PDF files (slides), audio files, flash video files (.swf) and flash video streaming (.flv) . We describe in detail how integrate these individual components to automatically create a high quality presentation, that is, based on vectorial components, with small size of result files. It also allows to integrate video or video streaming into single slides. In contrast to commercial tools, this tool does not use special interfaces or formats and it allows one to export presentations to formats compatible with other (future) presentation tools. Our tool also allows one to work with traditional tools to create slides (such as PowerPoint, OpenOffice Impress or LaTeX) provided that the final slides are exported to PDF and also to use standard audio tools to create audio (WAV, OGG and MP3 are supported). Video can be included just by converting the original file to SWF (flash video) format or FLV (flash video streaming). In order to make use of the educational resource, we need just a web browser with flash plug-in installed and, therefore, the result can be easily distributed through a web server, a CD or a DVD.
Rico-Juan, J. R. and Carrasco, R. C. (2007b).
How to do easy video presentaions using open source tools.
In International Technology, Education and Development
Conference (INTED), volume 1, pages 30-31, Valencia, (Spain).
International Association of Technology, Education and Development (IATED).
[ bib |
http ]
(+Abstract-)
In this paper, we describe a flexible approximation to create video presentations using open source tools. In the new ECTS framework the time that the student spends in a laboratory or in a classroom is reduced and, therefore it is important to assist to students with materials more flexible and interactive than classical electronic papers or books. The open source programs are a good alternative to create educational resources. Often, it not possible to do the whole video presentation with a single tool, but it is possible choose different tools to do that. We implements a method to create video presentations from PDF file (slides), audio files and flash (.swf) video files. We describe in detail how integrate this individual components to create automatically a high quality video presentation with small output files. With commercial tools, we may only use the special interfaces and formats supported by the tool. As a consequence, we cannot export all presentation to files compatible with other presentations tool. So, if we cannot export our previous presentations it is difficult to change to a new better tool. The method described here solves these problems. It allows us to work with traditional tools to create slides (PowerPoint, Open Office Impress or LaTeX), provided that we export to PDF the final slides. We can use some audio tool to create audio for each slide in different formats (WAV, OGG and MP3 are supported). If we want to include a video we need to convert it to SWF (Sockwave FlashTM) format. The result requires only a web browser with a Flash plug-in. So, we can distribute the result in standard media such as a web server, a CD or a DVD.
Rico-Juan, J. R. and Iñesta, J. M. (2006b).
Edit Distance for Ordered Vector Sets: A Case of Study.
In Yeung, D., Kwok, J. T., Fred, A., Roli, F., and de Ridder, D.,
editors, Structural, Syntactic, and Statistical Pattern Recognition,
number 4109 in Lecture Notes in Computer Science, pages 200-207, Hong
Kong, China. Springer.
[ bib |
.pdf ]
(+Abstract-)
Digital contours in a binary image can be described as an ordered vector set. In this paper an extension of the string edit distance is defined for its computation between a pair of ordered sets of vectors. This way, the differences between shapes can be computed in terms of editing costs. In order to achieve efficency a dominant point detection algorithm should be applied, removing redundant data before coding shapes into vectors. This edit distance can be used in nearest neighbour classification tasks. The advantages of this method applied to isolated handwritten character classification are shown, compared to similar methods based on string or tree representations of the binary image.
Rico-Juan, J. R. and Iñesta, J. M. (2006a).
An edit distance for ordered vector sets with application to
character recognition, volume 1, chapter 4, pages 54-62.
Computer Vision Center.
[ bib |
.ps ]
(+Abstract-)
In this paper a new algorithm to describe a binary image as an ordered vector set is presented. An extension of the string edit distance is defined for computing it between a pair of ordered sets of vectors. This edit distance can be used in nearest neighbor classification tasks. The advantages of this method applied to isolated handwritten character classification are shown, compared to similar methods based in string or tree representations of the binary image.
Rico-Juan, J. R., Calera-Rubio, J., and Carrasco, R. C. (2005).
Smoothing and Compression with Stochastic k-testable Tree
Languages.
Pattern Recognition, 38(9):1420-1430.
[ bib ]
(+Abstract-)
In this paper, we describe a generalization for tree stochastic languages of k-gram models. These models are based on the k-testable class, a subclass of the languages recognizable by ascending tree auntomata. One of the advantages of this approchis that the probabilistic model can be updated in an incremental fashion. Another feature is that backing-off schemes can be defined. As an illustration of their applicability, they have been used to compress tree data files at a better rate than string-based methods.
Rico-Juan, J. R. and Micó, L. (2004).
Finding significant points for a handwritten classification task.
In Campilho, A. and Kamel, M., editors, International
Conference on Image Analysis and Recognition, number 3211 in Lecture Notes
in Computer Science, pages 440-446, Porto, Portugal. Springer.
[ bib |
.pdf ]
(+Abstract-)
When objects are represented by curves in a plane, highly useful information is conveyed by significant points. In this paper, we compare the use of different mobile windows to extract dominant points of handwritten characters. The error rate and classification time using an edit distance based nearest neighbour search algorithm are compared for two different cases: string and tree representation.
Rico-Juan, J. R. and Micó, L. (2003b).
Some Results about the Use of Tree/String Edit Distances in a
Nearest Neighbour Classification Task.
In Goos, G., Hartmanis, J., and van Leeuwen, J., editors,
Pattern Recognition and Image Analysis, number 2652 in Lecture Notes in
Computer Science, pages 821-828, Puerto Andratx, Mallorca, Spain. Springer.
[ bib |
.pdf ]
(+Abstract-)
In pattern recognition there is a variety of applications where the patterns are classified using edit distance. In this paper we present some results comparing the use of tree and string edit distances in a handwritten character recognition task. Some experiments with different number of classes and of classifiers are done.
Rico-Juan, J. R. and Micó, L. (2003a).
Comparison of AESA and LAESA search algorithms using string and
tree edit distances.
Pattern Recognition Letters, 24(9):1427-1436.
[ bib |
.pdf ]
(+Abstract-)
Although the success rate of handwritten character recognition using a nearest neighbour technique together with edit distance is satisfactory, the exhaustive search is expensive. Some fast methods as AESA and LAESA have been proposed to find nearest neighbours in metric spaces. The average number of distances computed by these algorithms is very low and does not depend on the number of prototypes in the training set. In this paper, we compare the behaviour of these algorithms when string and tree edit distances are used.
Carrasco, R. C. and Rico-Juan, J. R. (2003).
A similarity between probabilistic tree languages: application to
XML document families.
Pattern Recognition, 36(9).
[ bib ]
(+Abstract-)
We describe a general approach to compute a similarity measure between distributions generated by probabilistic tree automata that may be used in a number of applications in the pattern recognition field. In particular, we show how this similarity can be computed for families of structured (XML) documents can be computed. In such case, the use of regular expressions to specify the right part of the expansion rules adds some complexity to the task.
Rico-Juan, J. R., Calera-Rubio, J., and Carrasco, R. C. (2002).
Stochastic k-testable Tree Languages and Applications.
In Adriaans, P., Fernau, H., and van Zaanen, M., editors,
Grammatical Inference: Algorithms and Applications. ICGI 2002, number 2484
in Lecture Notes in Artificial Intelligence, pages 199-212, Amsterdam
(Nederland). Springer-Verlag.
[ bib ]
(+Abstract-)
In this paper, we present a natural generalization of k-gram models for tree stochastic languages based on the k-testable class. In this class of models, frequencies are estimated for a probabilistic regular tree grammar wich is bottom-up deterministic. One of the advantages of this approach is that the model can be updated in an incremental fashion. This method is an alternative to costly learning algorithms (as inside-outside-based methods) or algorithms that require larger samples (as many state merging/splitting methods)
Rico-Juan, J. R. and Calera-Rubio, J. (2002).
Evaluation of handwritten character recognizers using
tree-edit-distance and fast nearest neighbour search.
In Iñesta, J. M. and Micó, L., editors, Pattern
Recognition in Information Systems, pages 326-335, Alicante (Spain). ICEIS
PRESS.
[ bib |
.pdf ]
(+Abstract-)
Although the rate of well classified prototypes using tree-edit-distance is satisfactory, the exhaustive classification is expensive. Some fast methods as AESA and LAESA have been proposed to find nearest neighbours in metric spaces. The average number of distances computed by these algorithms does not depend on the number of prototypes. In this paper we apply these classifiers algorithms to the task of handwritten character recognition and obtain a low average error rate (2%) and a fast classification.
Rico-Juan, J. R. (2001).
Inferencia estocástica y aplicaciones de los lenguajes de
árboles.
PhD thesis, Universidad de Alicante, Departamento de Lenguajes y
Sistemas Informáticos.
[ bib |
.pdf ]
(+Abstract-)
Una de las aportaciones originales de esta tesis es la definición de un modelo de inferencia estocástica para lenguajes k-testables de árboles y su aplicación a la compresión y clasificación. También se aportan otros modelos probabilísticos para las tareas de compresión de superficies 3D y reconocimiento de palabras manuscritas fuera de línea
Rico-Juan, J. R., Calera-Rubio, J., and Carrasco, R. C. (2000b).
Probabilistic k-testable tree-language.
In Oliveira, A. L., editor, Proceedings of 5th International
Colloquium, volume 1891 of Lecture Notes in Computer Science, pages
221-228, Lisboa (Portugal). Springer-Verlag.
[ bib |
.ps.gz |
.pdf ]
(+Abstract-)
In this paper, we present a natural generalization of k-gram models for tree stochastic languages based on the k-testable class. In this class of models, frequencies are estimated for a probabilistic regular tree grammar wich is bottom-up deterministic. One of the advantages of this approach is that the model can be updated in an incremental fashion. This method is an alternative to costly learning algorithms (as inside-outside-based methods) or algorithms that require larger samples (as many state merging/splitting methods)
Rico-Juan, J. R., Calera-Rubio, J., and Carrasco, R. C. (2000a).
Lossless compression of surfaces described as points.
In Ferri, F. J., Iñesta, J. M., Amin, A., and Pudil, P., editors,
Advances in Pattern Recognition, volume 1876 of Lecture Notes
in Computer Science, pages 457-461, Berlin. Springer-Verlag.
[ bib ]
(+Abstract-)
In many applications, objects are represented by a collection if unorganized points that scan the surface of the object. In such cases, an efficent way of storin this information is of interest. In this paper we present an arithmetic compression scheme that uses a tree representation of the data set and allows for better compression rates than general-purpose methods.
Rico-Juan, J. R. (1999b).
Off-line cursive handwritten word recognition based on tree
extraction and an optimized classification distance.
In Torres, M. I. and Sanfeliu, A., editors, Pattern Recognition
and Image Analysis: Proceedings of the VII Symposium Nacional de
Reconocimiento de Formas y Análisis de Imágenes, volume 3, pages
15-16, Bilbao (Spain).
[ bib |
.ps.gz |
.pdf ]
(+Abstract-)
This paper describes a geometric approach to the dificult off-line cursive handwritten word recognition problem. The method extracts and classifies feature trees from isolated handwitten words, mesasuring the distance between two trees.
Rico-Juan, J. R. (1999a).
Esquemas Algorítmicos.
Publicaciones de la Universidad de Alicante.
[ bib ]
(+Abstract-)
Este libro describe dos esquemas de programación: programación dináica y ramificación y poda. La descripción se hace desde un punto de vista general extrayendo característcas representativas y generando un esquema que luego se aplicará a casos concretos. Contiene gran variadad de ejemplos y ejerccios resueltos
|