Mathematical Understanding of Large Language Models

Abstract

Language models are rapidly turning into general-purpose AI models, and this calls for new conceptual frameworks, new evaluations, and new training methods.The proposal sketches a framework for thinking about emergence of complex capabilities in language models via scaling laws, and proposes to extend it tohandle multiple modalities, and multi-turn interactions. It proposes that testing of AImodels should focus on compositionality as a key meta-skill that allowstesting of low-probability scenarios. Finally, it also sketches research directions involving approximate gradients that may help speed up training and fine-tuningand make it more feasible in academia.

Document Details

Document Type
DoD Grant Award
Publication Date
Nov 09, 2024
Source ID
N000142412643

Entities

People

  • Sanjeev Arora

Organizations

  • Office of Naval Research
  • Trustees of Princeton University
  • United States Navy

Tags

Fields of Study

  • Computer science

Readers

  • Artificial Intelligence
  • Distributed Systems and Data Platform Development