Towards Large Language Models Robust to Adversarial Attacks

Abstract

This proposal aims to address the critical vulnerability of large language models (LLMs) to adversarial attacks, which can compromise the reliability and security of these models, particularly when deployed in safety-critical systems. Despite their widespread application across various domains, including naval operations, LLMs remain susceptible to crafted input sequences that can manipulate their behavior. Traditional methods for enhancing robustness, such as adversarial training and certified robustness, are impracticalfor LLMs due to their computational inefficiency and scalability issues. This project proposes the development of a new class of adversarially robust LLM training and inference methods tailored to overcome these challenges. By exploring techniques like slow adversarial training and representation control, the project seeks to significantly advance the current state of robustness in LLMs. Success in this endeavor could have far-reaching implications, securing LLMs against malicious inputs and ensuring their safe and effective deployment in mission-critical environments, including naval operations where reliability and security are paramount.APPROVED FOR PUBLIC RELEASE

Document Details

Document Type
DoD Grant Award
Publication Date
Nov 09, 2024
Source ID
N000142412693

Entities

People

  • J. Zico Kolter

Organizations

  • Carnegie Mellon University
  • Office of Naval Research
  • United States Navy

Tags

Fields of Study

  • Computer science

Readers

  • Cybersecurity.
  • Distributed Systems and Data Platform Development
  • Military Engineering.

Technology Areas

  • AI & ML
  • AI & ML - DoD AI Strategy
  • AI & ML - Neural Networks