A Large-scale Distributed Indexed Learning Framework for Data that Cannot Fit into Memory
Abstract
This project deals with issues on distributed learning for big data and addresses three major problems. 1) Learning a classifier where data contain many samples that do not help improve the model quality, which cost much I/O and large memory to process. A Block Coordinate Descent combined with Approximate Nearest Neighbor (ANN) search to select active samples in dual mode was shown to outperform the-state-of-the-art. 2) Complex query search in which sending it to all the local machines is very costly. Decomposing the reference patterns into multi-resolution solved the distributed kNN/kFN pattern matching very efficiently. 3) Distributed learning problem for unlimited unlabeled data stream from many clients needed to send to a server to learn a classifier. Integrating three learning techniques (online, semi-supervised and active learning) together with a selective sampling with minimum communication between the server and the clients solved this problem.
Document Details
- Document Type
- Technical Report
- Publication Date
- Mar 27, 2015
- Accession Number
- ADA616935
Entities
People
- Shou-De Lin
Organizations
- National Taiwan University