next up previous contents index
Next: Neural Moore machines Up: Discrete-time recurrent neural networks Previous: Discrete-time recurrent neural networks   Contents   Index


Neural Mealy machines

Omlin and Giles (1996a) and Omlin and Giles (1996b) have used a second-order4.6recurrent neural network (similar to the one used by Giles et al. (1992), Pollack (1991), Forcada and Carrasco (1995), Watrous and Kuhn (1992), and Zeng et al. (1993) ) which may be formulated as a Mealy NSM described by a next-state function whose $i$-th coordinate ( $i=1,\ldots,n_X$) is

\begin{displaymath}
f_i({\bf x}[t-1],{\bf u}[t])=
g\left( \sum_{j=1}^{n_X} \sum_{k=1}^{n_U} W^{xxu}_{ijk} x_j[t-1] u_k[t] +
W^x_i \right),
\end{displaymath} (4.8)

where $g:\mathbb{R}\rightarrow [S_0,S_1]$ (usually $S_0=0$ or $-1$ and $S_1=1$) is the activation function4.7of the neurons, and an output function whose $i$-th coordinate ( $i=1,\ldots,n_Y)$ is
\begin{displaymath}
h_i({\bf x}[t-1],{\bf u}[t])=
g\left( \sum_{j=1}^{n_X} \sum_{k=1}^{n_U} W^{yxu}_{ijk} x_j[t-1] u_k[t] +
W^y_i \right).
\end{displaymath} (4.9)

Throughout this document, a homogeneous notation will be used for weights. Superscripts indicate the computation in which the weight is involved: the $xxu$ in $W_{ijk}^{xxu}$ indicates that the weight is used to compute a state ($x$) from a state and an input ($xu$); the $y$ in $W^y_i$ (a bias) indicates that it is used to compute an output. Subscripts designate, as usual, the particular units involved and run parallel to superscripts.

Activation functions $g(x)$ are usually required to be real-valued, monotonously growing, continuous (very often also differentiable), and bounded; they are usually nonlinear. Two commonly used examples of differentiable activation functions are the logistic function $g_L(x)=1/(1+\exp(-x))$, which is bounded by $0$ and $1$, and the hyperbolic tangent $g_T(x)={\rm tanh}(x)=(1-\exp(-2x))/(1+\exp(-2x))$, which is bounded by $-1$ and $1$. Activation functions are usually required to be differentiable because this allows the use of learning algorithms based on gradients. There are also a number of architectures that do not use sigmoid-like activation functions but instead use radial basis functions ((Haykin, 1998), ch. 5; Hertz et al. (1991, 248)), which are not monotonous but instead are Gaussian-like functions that reach their maximum value for a given value of their input. DTRNN architectures using radial basis functions have been used by Frasconi et al. (1996); Cid-Sueiro et al. (1994).

Another Mealy NSM is that defined by Robinson and Fallside (1991) under the name of recurrent error propagation network, a first-order DTRNN which has a next-state function whose $i$-th coordinate ( $i=1,\ldots,n_X$) is given by

\begin{displaymath}f_i({\bf x}[t-1],{\bf u}[t])=g\left(\sum_{j=1}^{n_X} W_{ij}...
...j[t-1] +
\sum_{j=1}^{n_U} W_{ij}^{xu} u_j[t] +
W^x_i\right),
\end{displaymath} (4.10)

and an output function ${\bf h}({\bf x}[t-1],{\bf u}[t])$ whose $i$-th component ( $i=1,\ldots,n_Y$) is given by
\begin{displaymath}
h_i({\bf x}[t-1],{\bf u}[t])=g\left(\sum_{j=1}^{n_X} W_{ij}^...
...j[t-1] +
\sum_{j=1}^{n_U} W_{ij}^{yu} u_j[t] +
W^y_i\right).
\end{displaymath} (4.11)

Jordan (1986) nets may also be formulated as Mealy NSM. Both the next-state and the output function use an auxiliary function ${\bf z}({\bf x}[t-1],{\bf u}[t])$ whose $i$-th coordinate is
\begin{displaymath}
z_i(({\bf x}[t-1],{\bf u}[t]) = g\left( \sum_{j=1}^{n_X} W_{...
...j[t-1]
+ \sum_{j=1}^{n_U} W_{ij}^{zu} u_j[t]
+ W_i^z\right),
\end{displaymath} (4.12)

with $i=1,\ldots,n_Z$. The $i$-th coordinate of the next-state function is
\begin{displaymath}
f_i({\bf x}[t-1],{\bf u}[t]) = \alpha x_i[t-1]
+ g\left( \s...
...{n_Z} W_{ij}^{xz} z_j({\bf x}[t-1],{\bf u}[t])
+ W_i^x\right)
\end{displaymath} (4.13)

(with $\alpha\in[0,1]$ a constant) and the $i$-th coordinate of the output function is
\begin{displaymath}
h_i({\bf x}[t-1],{\bf u}[t]) = g\left( \sum_{j=1}^{n_Z} W_{ij}^{xz}
z_j({\bf x}[t-1],{\bf u}[t])
+ W_i^x\right).
\end{displaymath} (4.14)


next up previous contents index
Next: Neural Moore machines Up: Discrete-time recurrent neural networks Previous: Discrete-time recurrent neural networks   Contents   Index
Debian User 2002-01-21