NICOP - Toward Computational Information Topology
Abstract
A common theme in data analysis is the treatment of data as point clouds. Examples include textdocuments, images, speech patterns,"" and genome sequences. While a typical point has a largenumber of coordinates and thus lives in a high-dimensional space, it often" makes sense to adopt ageometric/topological way of thinking. Our goal is to enhance the existing infrastructure neededto explore and meaningfully reason about point clouds in high dimensions. This proposal bringstogether two orthogonal topics: information theoretically inspired dissimilarities and topologicalmethods.The prototypical information theory inspired dissimilarity is the Kullb"ack~Leibler divergence,also known as relative entropy. It measures the expected loss in compression efficiencyif we encode samples" from a first probability distribution with code optimized for a second distribution.While this notion of dissimilarity is not a me"tric, it proved its utility in numerousapplications in the sciences and engineering. It approximates the square of a metric for nea""rbypoints, and if we integrate the square root of the relative entropy along shortest paths we getthe similarly popular Fisher inf"ormation metric. The above two dissimilarities are but two examplesin a rich family we intend to study. The topological methods for" analyzing datago beyond networks and add higher-dimensional cells (tuplets) to the structure, which allow usto harvest additional" information about the data. We favor the multi-scale topological approachthat unifies the views at different resolutions into a single framework that leads quantificationsof shape features and hole structures. Evidence for the utility of this method can be found in therecent literature in which persistent homology is applied to numerous questions in science andengineering. Incorporating i"nformation theoretic dissimilarities, we can extend the applicationsto new domains.The proposed work is of fundamental nature but"" not preliminary. There is good reasonto believe that there is a kingdom called computational information topology, and it is time"to discover what it has to say about some of the scientific challenges we face today. Primarycandidates for applications are text a"nd neuroscience data, both of which derive from informationprocessing activities.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Jan 23, 2018
- Source ID
- N629091812038
Entities
People
- Herbert Edelsbrunner
Organizations
- Institute of Science
- Office of Naval Research
- United States Navy