Issues in Scaling Up Machine Learning
Abstract
This grant investigates issues in improving the accuracy of machine learning systems. The classic machine learning paradigm for prediction has been to learn a set of decision structures or models from a training set and select one for prediction on unseen test data. Rather than select a single node from the set, the focus of this project's research has been to combine the prediction of the learned models to form an improved estimate. The two fronts of this research are regression and classification. In the realm of regression, the task is to predict a single continuous value for an example. The majority of research in this area has focused on simple linear combination of the learned models. The nature of these weights may span from being highly regularized completely unconstrained. A set of weights is considered highly regularized if they are all positive, they sum to one, or they are uniform. Completely unconstrained weights have no restrictions and may be derived by methods like ordinary least squares regression. The degree of regularization required depends on the particular regression problem. The project has developed a technique called PCRY, which automatically estimates the appropriate degrease regularization for a given data set. The basic idea is to use the eigen structure of the model predictions on the training data to derive a continuum of possible weight sets ranging front highly regularized to completely unconstrained. Cross validation is used to estimate which weight set is most appropriate.
Document Details
- Document Type
- Technical Report
- Publication Date
- Mar 13, 1997
- Accession Number
- ADA337740
Entities
People
- Michael Pazzani
Organizations
- University of California, Irvine