Collaborative proposal: Geometric methods for optimal matching and feature identification in data se
Abstract
This proposal describes a series of interconnected projects organized around the theme of optimization in geometric data analysis. G,eometric data analysis takes abstract data and uses combinatorial methods to produce a geometric object that can then be studied by,geometric means. This project explores how this methodology can be applied to seek optimal matchings between data collected from dif,ferent sources or different measurement modalities for the same data. The geometric features of the geometric object constructed fro,m abstract data often have concrete meaning in terms of the original data. The project describes techniques to identify features in,data by searching for embeddings of optimal representatives of geometric structures.Optimal matching problems. A core problem in dat,a analysis is the following: Given two finite metric spaces (X, dX) and (Y, dY ), find a matching for the points of X and Y that min,imizes some loss function reflecting metric distortion (e.g., the Gromov-Wasserstein distance). The overhead of direct methods makes, application to moderately-sized data sets infeasible, even under aggressive relaxations. In prior work, the PIs (and collaborators), introduced a new method for approximating such matchings on very large data sets. The proposed work builds on this foundation in a,number of different directions: subset matchings, dimensionality reduction via matching, and metric inference via markers. Each of t,hese thrusts has broad applicability to problems arising in real data.Optimal loop identification. Data sets from time series or wit,h a time component often display periodic or quasi-periodic behavior. Minimal loops that arise in these data sets are the basic buil,ding blocks of qualitative descriptions of the dynamical behavior. A key problem is to determine the size and location of these loop,s from finite samples. In prior work, PIs Blumberg and Mandell introduced theoretical foundations for identifying loops (and more co,mplicated geometricmotifs) in data sets using ideas from quantitative algebraic topology. That work is based on extending the classi,cal idea of the fundamental group of a space X, which provides a way of studying loops in X: it develops a quantitative version of t,he fundamental group that records the length of the loops. The proposed research will develop efficient algorithms for applying thes,e ideas in the context of optimal loop identification.The proposed research will develop robust algorithms for optimal matching prob,lems and optimal loop identification with wide applicability to signals from many sources. If successful, this research will introdu,ce a new toolkit for solving optimization problems in geometric data analysis and using this to describe qualitative features of dat,a.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Mar 05, 2022
- Source ID
- N000142212126
Entities
People
- Soledad Villar
Organizations
- Johns Hopkins University
- Office of Naval Research
- United States Navy