Next: 9.9.1 Deficiencies of Steepest Up: 9 Loosely Synchronous Problems Previous: 9.8.3 The Concurrent Algorithm

Optimization Methods for Neural Nets:Automatic Parameter Tuning and FasterConvergence

Computers and standard programming languages can be used efficiently for high-level, clearly formulated problems such as computing balance sheets and income statements, solving partial differential equations, or managing operations in a car factory. It is much more difficult to write efficient and fault-tolerant programs for ``simple'' primitive tasks like hearing, seeing, touching, manipulating parts, recognizing faces, avoiding obstacles, and so on. Usually, the existing artificial systems for the above tasks are within a narrowly limited domain of application, very sensitive to hardware and software failures, and difficult to modify and adapt to new environments.

Neural nets represent a new approach to bridging the gap between cheap computational power and solutions for some of the above-cited tasks. We as human beings like to consider ourselves good examples of the power of the neuronal approach to problem solving.

To avoid naive optimism and over inflated expectations about ``self-programming'' computers, it is safer to see this development as the creation of another level of tools insulating generic users looking for fast solutions from the details of sophisticated learning mechanisms. Today, generic users do not care about writing operating systems; in the near future some users will not care about programming and debugging. They will have to choose appropriate off-the-shelf subsystems (both hardware and software) and an appropriate set of examples and high-level specifications; neural nets will do the rest. Neural networks have already been useful in areas like pattern classification, robotics, system modeling, and forecasting over time ([Borsellino:61a], [Broomhead:88a], [Gorman:88a], [Sejnowski:87a], [Rumelhart:86b], [Lapedes:87a]).

The focus of this work is on ``supervised learning'', that is, learning an association between input and output patterns from a set of examples. The mapping is executed by a feed-forward network with different layers of units, such as the one shown in Figure 9.22.

Figure 9.22: Multilayer Perceptron and Transfer Function

Each unit that is not in the input layer receives an input given by a weighted sum of the outputs of the previous layer and produces an output using a ``sigmoidal'' transfer function, with a linear range and saturation for large positive and negative inputs. This particular architecture has been considered because it has been used extensively in neural network research, but the learning method presented can be used for different network designs ([Broomhead:88a]).

Next: 9.9.1 Deficiencies of Steepest Up: 9 Loosely Synchronous Problems Previous: 9.8.3 The Concurrent Algorithm

Guy Robinson
Wed Mar 1 10:19:35 EST 1995