Radial Complexity Estimation for Improved Generalization in Artificial Neural Networks

Abstract

When training an artificial neural network (ANN) for classification using backpropagation of error, the weights are usually updated by minimizing the sum-squared error on the training set. As training ensues, overtraining may be observed as the network begins to memorize the training data. This occurs because, as the magnitude of the weight vector, W, grows, the decision boundaries become overly complex in much the same way as a too-high order polynomial approximation can overfit a data set in a regression problem. Since w grows during standard backpropagation, it is important to initialize the weights with consideration to the importance of the weight vector magnitude, w. With this in mind, the expected value of the magnitude of the initial weight vector is here derived for the separate cases of each weight drawn from a normal or uniform distribution. The usefulness of this derivation is universal since the magnitude of the weight vector plays such an important role in the formation of the classification boundaries. When the network overtrains on the training data, it will not exhibit consistently low error on subsequent test data. One way to overcome this overtraining problem is to stop the training early, which limits the magnitude of the weight vector below what it would be if the training were allowed to continue until a near-global training error minimum were found. The question then is when to stop the training. Here, the relationship between training data set size and the magnitude of the weight vector providing good generalization results is empirically established using cross-validational analysis on small subsets of the training data. These results are then used to estimate at what weight vector magnitude the training should be stopped when using the full data set.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 1998
Accession Number
ADA353812

Entities

People

  • Lemuel R. Myers Jr

Organizations

  • Air Force Institute of Technology

Tags

Communities of Interest

  • C4I

DTIC Thesaurus Topics

  • Air Force
  • Algorithms
  • Bayesian Networks
  • Boundaries
  • Cartesian Coordinates
  • Classification
  • Coordinate Systems
  • Data Sets
  • Genetic Algorithms
  • Infrared Images
  • Neural Networks
  • Normal Distribution
  • Probability
  • Probability Density Functions
  • Probability Distributions
  • Random Variables
  • Validation

Readers

  • Approximation Theory.
  • Neural Network Machine Learning.
  • Regression Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Neural Networks