Next: Organization of the document Up: Introduction Previous: Introduction Contents Index

Neural networks and formal models of language and computation

The last decade has witnessed a surge of interest in the relationships between the behavior of artificial neural networks and models used in formal language theory and theory of computation, such as automata or Turing machines. As a result, an increasing number of researchers are focusing their attention in questions such as

``what can a neural network compute?'', and --since neural networks are by nature trainable systems--
``what can a neural network learn to compute?''

Distinguishing these two apparently equivalent questions is crucial in the field of artificial neural networks. It is often the case that neural network architectures that may be shown to be capable of a certain kind of computation cannot easily be trained from examples to perform it. Unfortunately, one may found examples in the literature where researchers attribute an observed problem to the architecture used when it may also be due to the training algorithm (learning algorithm) used.

These questions --which will be called the main questions in this Introduction-- tie together the cores of two broad fields of basic knowledge: computer science and neuroscience. The convergence of their daughter disciplines (such as pattern recognition, artificial intelligence, neurocognition, etc.) in the interdisciplinary arena of artificial neural networks may be reasonably expected to have a great technological impact and a wealth of applications.

Indeed, on the one hand, in computer science, the formal theories of language and computation are so intimately related that they may be considered to form a single body of knowledge. One may just take a look at the titles of highly cited and well known books on the subject; as an example, take the Introduction to automata theory, languages and computation by Hopcroft and Ullman (1979). There is a well-established relationship between the levels in Chomsky's hierarchy of language complexity (regular, context-free, context-sensitive and unrestricted, in order of increasing complexity) and the classes of idealized machines, (correspondingly, finite-state automata, pushdown or stack automata, linearly bounded automata and Turing machines, in increasing order of computational power). This equivalence allows us to view computational devices --that is, automata-- as language acceptors or language recognizers and their computational power in terms of the complexity of the language class they accept or recognize.

On the other hand, the advances in neuroscience have inspired idealized models of the brain and the nervous system. Extremely simplified models of the behavior of an isolated neuron and of the interaction between neurons have been used to construct the concept of artificial neural networks, systems that are capable of acting as universal function approximators (Hornik et al., 1989), amenable to be trained from examples without the need for a thorough understanding of the task in hand, and able to show surprising generalization performance and predicting power, thus mimicking some interesting cognitive properties of evolved natural neural systems such as the brains of mammals.

In the light of the equivalence between language and automata classes, and the wide variety of neural architectures --more specifically, discrete-time recurrent neural networks-- that can be used to emulate the behavior of sequential machines, the basic questions stated at the beginning of this introduction may be given a more concrete formulation: on the one hand,

can a neural network of architecture class perform the same computation as an automaton of class ?
can a neural network of architecture class be a recognizer for languages of language class ?

and on the other hand,

can a neural network of architecture class be trained ^2.1 to perform the same computation as an automaton of class , from a set of examples?
can a neural network of architecture class be trained to recognize a language of class from a set of examples?

Viewing computing devices as language recognizers or language acceptors allows researchers to view examples as strings, belonging or not to the language --to the computation-- that is to be learned by the neural net. This brings the research in this field very close to another important area of computer science: grammatical inference, that is learning grammars from examples. Grammatical inference is relevant because a wide variety of phenomena may be represented and treated as languages after discretization of the input sequences as strings of symbols from an appropriate alphabet.

One of the first contacts of these two wide streams of research may be traced to a date as early as 1943, when two researchers from MIT, Warren S. McCulloch and Walter Pitts, published their paper ``A logical calculus of the ideas immanent in nervous activity'' (McCulloch and Pitts, 1943). This paper is considered to be seminal to both the field of artificial neural networks and to that of automata theory. As Perrin (1990) put it, McCulloch and Pitts presented ``a logical model for the behaviour of nervous systems that turned out to be the model of a finite-state machine''. This contact was followed a decade later by the work of Kleene (1956) and Minsky (1956), who gradually moved away from the neural formalization toward a logical and mathematical layout that would be called ``finite automata''. It was Minsky (1967) who made the often-quoted statement ``every finite-state machine is equivalent to, and can be simulated by, some neural net''.

A long hiatus of more than three decades followed that early junction. The field of artificial neural networks grew falteringly in the beginning and then bloomed exuberantly in the late seventies and the early eighties with the work of Werbos (1974), Kohonen (1974), Hopfield (1982), and McClelland et al. (1986). The formal theory of language and computation has been maturing steadily during this time. It was not until the late eighties that both fields made contact again, this time for a long and fertile relationship. This document is a selection of what I believe to be some of the best representatives of the work that has been done in the field that connects artificial neural networks, automata and formal models of language and computation.

Next: Organization of the document Up: Introduction Previous: Introduction Contents Index

Debian User 2002-01-21