Answering High-Level Questions on Low-Level Data
Abstract
The motivation of this proposal is building interactive systems that can help end users deeply understand a complex situation. For example, suppose an exhibition, rally, concert, or game is happening in a city, and many streams of raw information are available: videos of key landmarks such busy intersections or public squares, measurement of traffic (both road and Internet), the Twitter stream, etc. Understanding these complex situations would enable many applications, from optimizing the transportation system to monitoring terrorism to providing situation understanding for a coalition at the tactical edge of an operation. A natural way for an end user to gain understanding is to simply ask high-level, complex questions in natural language such as ÒDuring which hour did most people pass through the square?Ó The goal is to develop methods that leverage the available low-level multi-modal data to answer the userÕs high-level questions. From the language side, executable semantic parsing offers a promising paradigm for the above goal. In executable semantic parsing, a structured knowledge base contains information about a situation. A userÕs question (in natural language) is mapped (via a semantic parser) to a logical form (e.g., a database query), which can be executed on the knowledge base to provide the answer. By allowing nature language input, semantic parsing provides utility for general users. By parsing into a logical form, semantic parsing decomposes a complex analytic goal into primitive computations, which provides a substantial powerful However, curating a knowledge base is expensive, time-consuming, and not always reliable. Furthermore, words such as ÒnearÓ denote concepts which are context-sensitive and gradable, and are not easily representable in a standard logical knowledge base. The central difficulty is that words in language refers to concepts that represent abstractions of the raw data. For example, Òpass throughÓ refers to a certain (soft) class of trajectories. Spatial relations such as ÒnearÓ are often highly context-sensitive. What constitutes a ÒgroupÓ of people? For time series data, a whole new host of temporal words and concepts emerge: ÒlagÓ, ÒdeflectedÓ, ÒspikeÓ, ÒdipÓ, etc. This proposal will explore two ways of representing these concepts based on generative modelsÑprogram-based and recurrent neural networks. The underlying principle is a generative model of high dimensional time series data that will generalize across modalities. Indeed, Òsuddenly stopÓ should apply equally well to traffic as it does to a soccer player. Having defined models of abstract concepts, the second major challenge is learning our abstract concepts. Unsupervised learning askes for explaining the data well by succinct means: if this compression is successful, then it gives credence to the concepts learned. It is also important to leverage the multi-modality of our data. Specifically, the hypothesis is that important concepts manifest themselves across modes: when a soccer player scores, there is motion on the field, there are sounds from the audience, and a score increases; this should provide valuable supervision. After learning the abstract concepts and anchoring them to words, raw data streams will be mapped into knowledge bases. Then a natural language interface can be learned into this knowledge base. The training signal of the semantic parser can be used to provide top-down guidance, backpropagating through the concept representations themselves. The end result will be a system that can take a stream of low-level data in a new domain and support high-level question answering about general trends.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Feb 14, 2019
- Source ID
- W911NF1910028
Entities
People
- Percy Liang
Organizations
- Army Contracting Command
- Stanford University
- United States Army