When neural networks are used to do sequence processing, the most general architecture is a recurrent neural network (that is, a neural network in which the output of some units is fed back as an input to some others), in which, for generality, unit outputs are allowed to take any real value in a given interval instead of simply two characteristic values as in threshold linear units. In particular, since sequences are discrete in nature (that is, they are made of data indexed by integers), the processing occurs in discrete steps, as if the network were driven by an external clock, and each of the neurons is assumed to compute its output instantaneously, hence the name discrete-time recurrent neural networks to account for this fact. There is another wide class of recurrent neural networks in which inputs and outputs are functions of a continuous time variable and neurons have a temporal response (relating state to inputs) that is described by a differential equation in time (Pineda, 1987). These networks are aptly called continous-time recurrent neural networks (for an excellent review, see Pearlmutter (1995)).
Discrete-time recurrent neural networks are adaptive, state-based sequence processors that may be applied to any of the four broad classes of sequence processing tasks mentioned in section 3.1: in sequence classification, the output of the DTRNN is examined only at the end of the sequence; in synchronous sequence transduction tasks, the DTRNN produces a temporal sequence of outputs corresponding to the sequence of inputs it is processing; in sequence continuation or prediction tasks, the output of the DTRNN after having seen an input sequence may be interpreted as a continuation of it; finally, in sequence generation tasks, a constant or no input may be applied in each cycle to generate a sequence of outputs.
In this document, it has been found to be convenient to see discrete-time
recurrent neural networks (DTRNN) (see
Haykin (1998), ch. 15; Hertz et al. (1991), ch. 7;
Hush and Horne (1993); Tsoi and Back (1997)) as neural state machines
(NSM), and to define them in a way that is parallel to the
definitions of Mealy and Moore machines given in
section 2.3. This parallelism is inspired in the
relationship established by
Pollack (1991) between deterministic finite-state automata
(DFA)
and a class of second-order DTRNN,^{4.3} under the name of dynamical
recognizers. A neural state
machine is a six-tuple
(4.5) |
(4.6) |
(4.7) |