next up previous contents index
Next: Inference of context-free grammars Up: Inference of finite-state machines Previous: #tino95j###   Contents   Index


Das and Mozer (1998)

(http://www.dlsi.ua.es/~mlf/nnafmc/papers/sreerupa98dynamic.pdf) encourage a second-order DTRNN (similar to the ones used by Giles et al. (1992)) to adopt a finite-state like behavior by means of clustering methods, which may be unsupervised or supervised. In the first case, unsupervised clustering of the points of state space visited by the network is used after a certain number of training epochs and a new next-state function is constructed as follows: first, a next state candidate ${\bf x}'[t]$ is computed from ${\bf x}[t-1]$ using eq. 3.8; then, it is assigned to the corresponding cluster; finally, it is linearly combined with the corresponding centroid ${\bf c}[t]$ to obtain the next state: ${\bf x}[t]=(1-\alpha){\bf x}'[t]+\alpha {\bf c}[t]$, with $\alpha\in[0,1]$ estimated from the current error. In the second (supervised) case, states are assumed to be ideally fixed points but actually corrupted by noise that follows a Gaussian distribution whose mean and variance is estimated for each state simultaneously to the weights of the DTRNN. The method assumes a known number of states and uses a temperature parameter to gradually shrink the Gaussians as the error improves. In the experiments, both the supervised and unsupervised approaches improve the results obtained without using any clustering; the supervised clustering method performs much better than the unsupervised one. The idea of using clustering to improve FSM learning by DTRNN had been previously reported by Das and Das (1991) and Das and Mozer (1994).


next up previous contents index
Next: Inference of context-free grammars Up: Inference of finite-state machines Previous: #tino95j###   Contents   Index
Debian User 2002-01-21