Factual and Value-aligned Language Models

Abstract

Large language models have major trustworthiness challenges: in tasks with an objective notion of correctness, LMs confidently makeerrors and hallucinate facts, while in tasks that are more subjective such as creative writing, LMs show biases in the values and opinions they produce. This proposal aims to address challenges across the spectrum of both objective and subjective settings by building better mechanisms for controlling and aligning language models. Our approach aims to build more powerful and precise inference-time control mechanisms. For objective tasks, we will develop control algorithms that leverage conformal prediction techniques to provide high-probability guarantees of correctness, while for opinion and value alignment we will leverage in-context learning and consistency-regularization style approaches to ensure that we can align LLMs to human preferences and opinions as measured by survey data. The two aims will advance the existing state-of-the-art in precisely controlling language models and open the way for us to build more reliable, trustworthy language models.Approved for Public Release

Document Details

Document Type: DoD Grant Award
Publication Date: Nov 09, 2024
Source ID: N000142412609

Entities

People

Tatsunori Hashimoto

Organizations

Office of Naval Research
Stanford University
United States Navy

Factual and Value-aligned Language Models

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas