Factual and Value-aligned Language Models

Abstract

Large language models have major trustworthiness challenges: in tasks with an objective notion of correctness, LMs confidently makeerrors and hallucinate facts, while in tasks that are more subjective such as creative writing, LMs show biases in the values and opinions they produce. This proposal aims to address challenges across the spectrum of both objective and subjective settings by building better mechanisms for controlling and aligning language models. Our approach aims to build more powerful and precise inference-time control mechanisms. For objective tasks, we will develop control algorithms that leverage conformal prediction techniques to provide high-probability guarantees of correctness, while for opinion and value alignment we will leverage in-context learning and consistency-regularization style approaches to ensure that we can align LLMs to human preferences and opinions as measured by survey data. The two aims will advance the existing state-of-the-art in precisely controlling language models and open the way for us to build more reliable, trustworthy language models.Approved for Public Release

Document Details

Document Type
DoD Grant Award
Publication Date
Nov 09, 2024
Source ID
N000142412609

Entities

People

  • Tatsunori Hashimoto

Organizations

  • Office of Naval Research
  • Stanford University
  • United States Navy

Tags

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Computational Linguistics
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Machine Translation