A Semi-supervised Heat Kernel Pagerank MBO Algorithm for Data Classification
Abstract
We present a very efficient semi-supervised graph-based algorithm for classification of high-dimensional data that is motivated by the MBO method of Garcia-Cardona (2014) and derived using the similarity graph. Our procedure is an elegant combination of heat kernel page rank and the MBO method applied to study semi-supervised problems. The timing of our algorithm is highly dependent on how quickly the page rank can be computed; we use two different yet very efficient approaches to calculate the page rank, one of which proceeds by simulating random walks of bounded length. Overall, our method is advantageous for very big, sparse data, in which the graph has few edges, and it produces good accuracy even if the number of labeled instances is very small. In fact, the accuracy of the procedure is comparable with or better than that of state-of-the-art methods and is demonstrated on benchmark data sets. In addition to experimental results, we include a thorough comparison of our algorithm to that of Garcia-Cardona (2014) and describe the advantages of both methods.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jul 01, 2016
- Accession Number
- AD1018376
Entities
People
- Andrea Bertozzi
- Ekatherina Merkurjev
- Fan Chung
Organizations
- University of California