Next Generation Natural Language Interfaces for Data Extraction, Manipulation,and Visualization

Abstract

The information deluge has greatly increased the difficulty for users to access and manipulate data. The proliferation of data processing frameworks does not help much in that regard, as users now need to master both the processing framework s programming interface and ways to express their data processing tasks using them. Interfaces that utilize natural language (NL) utterances to generate code have recently become attractive as they allow users to specify their computation without writing any code. Unfortunately, due to the ambiguous nature of NL and scalability issues, current deep learning-based code generators can only understand small fragments of NL utterances, and are often limited in their application domains and the complexity of the generated code. In this project, we will drastically improve the scalability and usability of NL-to-code interfaces by leveraging recent advances in deep learning, program synthesis, and mixed initiative user interfaces. Our focus will be on users with data interaction needs and are capable of basic programming tasks, but are otherwise non-experts in coding. Specifically, we will study the following aspects of the problem: - Current NL-to-code generators focus on generating code in isolation from the rest of the development environment. However, as code is rarely written entirely from scratch, such an approach misses the vast amount of code context that is available, such as the partial code fragments that users have written, program variables that are in scope, and the available code comments. To address this, we will develop new deep learning models and algorithms that leverage such contextual information together with NL utterances in code generation, for both disambiguation and improving training efficiency. - We will explore different modes of interaction to incorporate into NL-to-code interfaces during development. For instance, building mixed initiative interfaces that solicit input-output examples and demonstrations from users in addition to utterances, and integrating semantic parsing as part of code execution rather than a separate offline process. The latter will augment the code generator with runtime information (e.g., values of variables), which will raise new challenges in building efficient inference and memoization mechanisms to reduce code generation overhead during program execution. - We will investigate means to provide feedback to users. For instance, designing algorithms to effectively identify "adversarial" inputs that can be used to query the user to prune down the program search space, and constructing dialog systems that interact with the user to iteratively refine the generated code, for instance as a means to optimize the performance of the generated code over time. To evaluate, we will deploy our techniques and prototypes to solve classical code generation tasks, such as generating SQL queries from NL. We will also construct new tasks as well to showcase the power of our approach, such as data transformation (e.g., changing the shape of input data to call training methods provided by a machine learning framework), and generating interactive data visualizations from large data corpus. The PI will leverage his expertise in both formal methods and natural language processing in this project. He has a track record in publishing in those fields, along with multiple early career awards and best paper citations in the past. The Knowledge Systems Program (Research Area (c)(i)(3)) managed by Dr. Purush Iyer is best suited to review this ECP proposal.

Document Details

Document Type
DoD Grant Award
Publication Date
Oct 07, 2021
Source ID
W911NF2110339

Entities

People

  • Alvin Cheung

Organizations

  • Army Contracting Command
  • United States Army
  • University of California, Berkeley

Tags

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Database Systems and Applications
  • Distributed Systems and Data Platform Development

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks
  • Space