Machine Learning in Heterogeneous Data from Multiple Sources
Abstract
Most existing machine learning algorithms assume that all data features are numeric and represent data instances as points in a multidimensional geometric space. They use distances between points in the space as a measure of their similarities and use them to extract patterns from data. However, in practice, this assumption may not hold because of- (i) the sensitivity of data representation (i.e., the relative positions of data points in the space depend on units-scales used to measure-represent data features), and (ii) the heterogeneity of real-world data that can come from a variety of sources in different forms (e.g., numeric and categorical). In this project, we aim to develop a robust (not sensitive to data representation) and flexible (able to handle domains with numeric, categorical, or mixed data features) framework to learn patterns from heterogeneous data. More specifically, the project will develop algorithms to solve problems such as classification, clustering, anomaly detection, dimensionality reduction, data visualisation in uncertain-noisy, multi-source, and multimodal data for real-world applications.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Feb 16, 2024
- Source ID
- FA23862314003
Entities
People
- Sunil Aryal
Organizations
- Air Force Office of Scientific Research
- Deakin University
- United States Air Force