Factual and Value-aligned Language Models
Abstract
Large language models have major trustworthiness challenges: in tasks with an objective notion of correctness, LMs confidently makeerrors and hallucinate facts, while in tasks that are more subjective such as creative writing, LMs show biases in the values and opinions they produce. This proposal aims to address challenges across the spectrum of both objective and subjective settings by building better mechanisms for controlling and aligning language models. Our approach aims to build more powerful and precise inference-time control mechanisms. For objective tasks, we will develop control algorithms that leverage conformal prediction techniques to provide high-probability guarantees of correctness, while for opinion and value alignment we will leverage in-context learning and consistency-regularization style approaches to ensure that we can align LLMs to human preferences and opinions as measured by survey data. The two aims will advance the existing state-of-the-art in precisely controlling language models and open the way for us to build more reliable, trustworthy language models.Approved for Public Release
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Nov 09, 2024
- Source ID
- N000142412609
Entities
People
- Tatsunori Hashimoto
Organizations
- Office of Naval Research
- Stanford University
- United States Navy