Scalable Learning for Geostatistics and Speaker Recognition

Abstract

With improved data acquisition methods, the amount of data that is being collected has increased several fold. One of the objectives in data collection is to learn useful underlying patterns. In order to work with data at this scale, the methods not only need to be effective with the underlying data, but also have to be scalable to handle larger data collections. This thesis focuses on developing scalable and effective methods targeted towards different domains, geostatistics and speaker recognition in particular. Initially we focus on kernel based learning methods and develop a GPU based parallel framework for this class of problems. An improved numerical algorithm that utilizes the GPU parallelization to further enhance the computational performance of kernel regression is proposed. These methods are then demonstrated on problems arising in geostatistics and speaker recognition. In geostatistics, data is often collected at scattered locations and factors like instrument malfunctioning lead to missing observations. Applications often require the ability to interpolate this scattered spatiotemporal data on to a regular grid continuously over time. This problem can be formulated as a regression problem, and one of the most popular geostatistical interpolation techniques, kriging is analogous to a standard kernel method: Gaussian process regression. Kriging is computationally expensive and needs major modifications and accelerations in order to be used practically. The GPU framework developed for kernel methods is extended to kriging and further the GPU's texture memory is better utilized for enhanced computational performance. Speaker recognition deals with the task of verifying a person's identity based on samples of his/her speech utterances. This thesis focuses on text-independent framework and three new recognition frameworks were developed for this problem. We proposed a kernelized Renyi distance based similarity scoring for speaker recognition.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2011
Accession Number
ADA559917

Entities

People

  • Balaji V. Srinivasan

Organizations

  • University of Maryland

Tags

Communities of Interest

  • Biomedical
  • Energy and Power Technologies
  • Materials and Manufacturing Processes
  • Space

DTIC Thesaurus Topics

  • Computational Fluid Dynamics
  • Computational Science
  • Computer Vision
  • Data Mining
  • Data Science
  • Databases
  • Dimensionality Reduction
  • Factor Analysis
  • Information Processing
  • Information Retrieval
  • Information Science
  • Machine Learning
  • Network Science
  • Statistical Algorithms
  • Supervised Machine Learning
  • Surveys
  • Warning Systems

Fields of Study

  • Computer science

Readers

  • Computational Fluid Dynamics (CFD)
  • Computer Vision.
  • Statistical inference.

Technology Areas

  • AI & ML