Learning with domain knowledge: an implicit probabilistic models approach
Abstract
Applications of supervised learning are critically limited by the expense of labeling large datasets. Labeling is a major bottleneck, and severely limits the rapid and inexpensive deployment of ML techniques to new tasks. Nonetheless, we observe that humans are often able to learn with very few labeled examples or with only high level instructions for how a task should be performed. Inthis proposal, we ask whether a similar principle can be applied to teaching machines; can we supervise ML systems with an effort that does not scale linearly in the size of the training data? Constraint learning attempts to reduce the labeling burden by having users describe properties that hold over the output space. Unlike labels, these properties hold across the entire dataset, rather than for only a single sample. Thus, they provide an opportunity for more cost effectivesupervision, but constraint learning is itself critically limited by the availability and cost of providing these properties. Describing high level invariants of a dataset to constrain learning is a nontrivial effort. We propose new approaches to combine learning and domain knowledge and overcome this challenge. Our objectives are to: Objective 1: Introduce new approaches for providing supervision by encoding prior domain knowledge into learning frameworks, e.g., capturing physics laws or common-sense knowledge.Objective 2: Develop new inference and learning algorithms to bridge the gap between datadriven and physics-based modeling.Objective 3: Evaluate the performance gains obtained incorporating domain knowledge in terms of accuracy, scalability, and robustness in structured prediction and generative modeling tasks.The proposed approaches are enabled by recent advances in implicit probabilistic models (such as Generative Adversarial Networks) which allow us to train complex models without having to explicitly evaluate likelihoods.The proposed techniques have the potential to significantly 1) reduce the time and cost associated with developing machine-learning-based solutions for new tasks. They will also 2) broaden the applicability of modern machine learning approaches based on representation learning, especially in domains where labels are scarce and very costly to obtain. Because they will mitigate the need for labeled data and they rely more on domain knowledge, they have thepotential to provide 3) more interpretable and more robust approaches that are more resistant totampering and adversarial inputs. As such, this project is closely aligned with the ONR Code 31 program Machine Learning, Reasoning and Intelligence and with the U.S. Navy~s vision of autonomous, adaptive systems that can safely operate in uncertain and unstructured environments.The total budget is $510000 over 3 years, including support for a graduate student working under the supervision of the PI, faculty salary, materials and supplies.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Apr 24, 2019
- Source ID
- N000141912145
Entities
People
- Stefano Ermon
Organizations
- Office of Naval Research
- Stanford University
- United States Navy