A Large-scale Distributed Indexed Learning Framework for Data that Cannot Fit into Memory

Abstract

This project deals with issues on distributed learning for big data and addresses three major problems. 1) Learning a classifier where data contain many samples that do not help improve the model quality, which cost much I/O and large memory to process. A Block Coordinate Descent combined with Approximate Nearest Neighbor (ANN) search to select active samples in dual mode was shown to outperform the-state-of-the-art. 2) Complex query search in which sending it to all the local machines is very costly. Decomposing the reference patterns into multi-resolution solved the distributed kNN/kFN pattern matching very efficiently. 3) Distributed learning problem for unlimited unlabeled data stream from many clients needed to send to a server to learn a classifier. Integrating three learning techniques (online, semi-supervised and active learning) together with a selective sampling with minimum communication between the server and the clients solved this problem.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Mar 27, 2015
Accession Number: ADA616935

Entities

People

Shou-De Lin

Organizations

National Taiwan University

A Large-scale Distributed Indexed Learning Framework for Data that Cannot Fit into Memory

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas