next up previous contents index
Next: Some DTRNN architectures cannot Up: DTRNN behaving as finite-state Previous: #goudreau94a###,   Contents   Index


DTRNN based on sigmoid units

Much of the work performed by researchers in the contact area between neural networks and formal language and computation theory concerns the training of sigmoid-based DTRNN to identify or approximate finite-state machines, and, in particular, regular language acceptors such as deterministic finite-state automata (DFA). This is the subject of chapter 5. Most of this work starts by assuming that a given DTRNN architecture is capable of performing the same computation as a finite-state machine (FSM). This section addresses the following question: ``When does a DTRNN behave as a FSM?''

In a recent paper, Casey (1996) has shown that a DTRNN performing a robust DFA-like computation (a special case of FSM-like computation) must organize its state space in mutually disjoint, closed sets with nonempty interiors corresponding to the states of the DFA. These states can be taken to be all of the points such that if the DTRNN is initialized with any of them, the DTRNN will produce the same output as the DFA initialized in the corresponding state. This formulation, which is closely connected to what Pollack (1991) called a dynamical recognizer, has inspired the following definition. A DTRNN $N=(X,U,Y,{\bf f},{\bf h},{\bf x}_0)$ behaves as a FSM $M=(Q,\Sigma,\Gamma,\delta,\lambda,q_I)$ when the following conditions are held:

Partition of the state space:
Each state $q_i \in Q$ is assigned a nonempty region $X_i\subseteq X$ such that the DTRNN $N$ is said to be in state $q_i$ at time $t$ when ${\bf x}[t]\in X_i$. Accordingly, these regions must be disjoint: $X_i\cap X_j=\emptyset$ if $q_i\neq q_j$. Note that there may be regions of $X$ that are not assigned to any state.

Representation of input symbols:
Each possible input symbol $\sigma_k \in\Sigma$ is assigned a different vector ${\bf u}_k\in U$ (it would also be possible to assign a region $U_k\subseteq U$ to each symbol).
Interpretation of output:
Each possible output symbol $\gamma_m \in \Gamma$ is assigned a nonempty region $Y_m\subseteq Y$ such that the DTRNN $N$ is said to output symbol $\gamma_m$ at time $t$ when ${\bf y}[t]\in Y_m$. Analogously, these regions must be disjoint: $Y_m\cap Y_n=\emptyset$ if $\gamma_m\neq\gamma_n$. Note that there may be regions of $Y$ that are not assigned to any output symbol.
Correctness of the initial state:
The initial state of the DTRNN $N$, belongs to the region assigned to the initial state $q_I$, that is, ${\bf x}_0\in X_I$.
Correctness of the next-state function:
For any state $q_j$ and symbol $\sigma_k$ of $M$, the transitions performed by the DTRNN $N$ from any point in the region of state space $X_j$ assigned to state $q_j$ when symbol $\sigma_k$ is presented to the network must lead to points that belong to the region $X_i$ assigned to $q_i=\delta(q_j,\sigma_k)$; formally, this may be expressed as
\begin{displaymath}
{\bf f}_k(X_j)\subseteq X_i \;\;\;\forall q_j\in Q,\sigma_k\in\Sigma : \delta(q_j,\sigma_k)=q_i
\end{displaymath} (5.1)

where the shorthand notation
\begin{displaymath}
{\bf f}_k(A)=\{{\bf f}({\bf x}, {\bf u}_k) : {\bf x}\in A \}
\end{displaymath} (5.2)

has been used.
Correctness of output:
In the case of Mealy NSM, for any state $q_j$ and symbol $\sigma_k$ of $M$, the output produced by the DTRNN $N$ from any point in the region of state space $X_j$ assigned to state $q_j$ when symbol $\sigma_k$ is presented to the network belongs to the region $Y_m$ assigned to $\gamma_m=\lambda(q_j,\sigma_k)$; formally, this may be expressed as
\begin{displaymath}
{\bf h}_k(X_j)\subseteq Y_m \;\;\;\forall q_j\in Q,\sigma_k\in\Sigma :
\lambda(q_j,\sigma_k)=\gamma_m
\end{displaymath} (5.3)

where the shorthand notation
\begin{displaymath}
{\bf h}_k(A)=\{{\bf h}({\bf x},{\bf u}_k) : {\bf x}\in A \}
\end{displaymath} (5.4)

has been used. In the case of Moore NSM, for any state $q_j$, the output produced by the DTRNN $N$ from any point in the region the condition for the correctness of the output may be expressed as:
\begin{displaymath}
{\bf h}(X_j)\subseteq Y_m \;\;\;\forall q_j\in Q :
\lambda(q_j)=\gamma_m,
\end{displaymath} (5.5)

with ${\bf h}(A)=\{{\bf h}({\bf x}) : {\bf x}\in A\}$.

Note that the regions $X_i\subseteq X$, $i=1,\ldots,\vert Q\vert$ and $Y_m\subseteq Y$, $m=1,\ldots,\vert\Gamma\vert$ may have a nondenumerable 5.5 number of points, because of being subsets of $\mathbb{R}^n$ for some $n$. However, for a finite input alphabet $\Sigma$, only a denumerable number of points in the state ($X$) and output ($Y$) spaces are actually visited by the net for the set of all possible finite-length input strings over $\Sigma$, denoted $\Sigma^*$, which is also denumerable.

Note also that DFA represent a special case: as said in section 2.3.3, deterministic finite-state automata may be seen as Moore or Mealy machines having an output alphabet $\Gamma=\{{\cal Y},{\cal N}\}$ whose output is only examined after the last symbol of the input string is presented, and it is such that, for a Moore machine,

\begin{displaymath}
\lambda(q_i) =\begin{array}{l} {\cal Y} \mbox{ if } q_i\in F \\
{\cal N} \mbox{ otherwise,}
\end{array}\end{displaymath} (5.6)

and for a Mealy machine,
\begin{displaymath}
\lambda(q_i,\sigma_k) =\begin{array}{l}
{\cal Y} \mbox{ if...
...q_i,\sigma_k)\in F \\
{\cal N} \mbox{ otherwise.}
\end{array}\end{displaymath} (5.7)

A region in output space $Y$ would be assigned to each one of these two symbols: $Y_{\cal Y}$, $Y_{\cal N}$, such that $Y_{\cal Y}\cap Y_{\cal N}=\emptyset$, and the output of the NSM would only be examined after the whole string has been processed.

Recently, Šíma (1997) has shown that the behavior of any DTRNN using threshold activation functions may be stably emulated by another DTRNN using activation functions in a very general class which includes the sigmoid functions considered in this paper, and describes the conditions under which this emulation is correct for inputs of any length. This means that any of the constructions described in section 4.2.1 to encode FSM in DTRNN may be converted, using the method by Šíma (1997) into an equally-capable analog DTRNN.



Subsections
next up previous contents index
Next: Some DTRNN architectures cannot Up: DTRNN behaving as finite-state Previous: #goudreau94a###,   Contents   Index
Debian User 2002-01-21