Foundations of Generative AI Models: Capability, Interpretability, and Alignment

Abstract

Generative AI models have demonstrated remarkable capabilities across diverse domains and tasks, transforming numerous disciplines and industries. However, their immense potential is accompanied by substantial challenges and risks stemming from limited theoretical foundations, poor understanding of underlying mechanisms, and deficient alignment algorithms. These challenges and risks will likely be exacerbated in sensitive defense applications such as autonomous vehicles. The primary objective of this research is to establish a robust foundation for generative AI models, including language models and diffusion models. We will investigate the capability, interpretability, and alignment aspects of these foundation models, aiming to address the critical issues that could hinder their safe and effective deployment.The research project encompasses three pivotal thrusts to advance the foundation of generative AI models. The first thrust explores the in-context learning capability of transformers. We leverage mean-field theory to analyze the training dynamics and employ causal mediation analysis to interpret the algorithms implemented by transformers. The second thrust aims todevelop better unlearning methods for language models. We propose adopting the reinforcement learning with human feedback (RLHF) framework, motivating a novel loss function for machine unlearning. Adversarial attacks represent a key challenge we aim to address via novel defense strategies. In the third thrust, we delve into diffusion models and study the interplay between hierarchical Bayesian models and the U-Net for denoising. We will also interpret the mechanisms of the denoising networks and design better alignment algorithms for diffusion models. Through rigorous theoretical analysis and extensive experimentation, this project will provide theoretical insights into the exceptional performance of modern generative AI systems, develop novel algorithms to interpret these systems, and align them with human values and ethical principles.The results of this project will facilitate the interpretable, reliable, and aligned incorporation of generative AI models in Naval applications. The research findings could be applied to various defense-related applications, such as autonomous systems, robot perception, video surveillance, environmental monitoring, robust data acquisition in complex systems, real-time decision-making, and uncertainty assessment.This abstract is approved for public release.

Document Details

Document Type: DoD Grant Award
Publication Date: Nov 09, 2024
Source ID: N000142412639

Entities

People

Song Mei

Organizations

Office of Naval Research
United States Navy
University of California Regents

Foundations of Generative AI Models: Capability, Interpretability, and Alignment

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas