 
 
 
 
 
 
 
 
 
 
When it comes to train a DTRNN to perform a certain sequence processing task, the first thing that should be checked is whether the DTRNN architecture chosen can actually represent or approximate the task that we want to learn. However, this is seldom possible, either because of our incomplete knowledge of the computational nature of the sequence processing task itself or because of our lack of knowledge about the tasks that a given DTRNN architecture can actually perform. In most of the following, I will be assumed that the DTRNN architecture (including the representation used for inputs, the interpretation assigned to outputs and the number of neurons in each layer) has already been chosen and that further learning may only occur through adjustment of weights, biases and similar parameters. We will review some of the problems that may occur during the adjustment of these parameters.
Some of the problems may appear regardless of the kind of learning algorithm used, and others may be related to gradient-based algorithms.
 
 
 
 
 
 
 
 
