Mathematical Understanding of Large Language Models

Abstract

Language models are rapidly turning into general-purpose AI models, and this calls for new conceptual frameworks, new evaluations, and new training methods.The proposal sketches a framework for thinking about emergence of complex capabilities in language models via scaling laws, and proposes to extend it tohandle multiple modalities, and multi-turn interactions. It proposes that testing of AImodels should focus on compositionality as a key meta-skill that allowstesting of low-probability scenarios. Finally, it also sketches research directions involving approximate gradients that may help speed up training and fine-tuningand make it more feasible in academia.

Document Details

Document Type: DoD Grant Award
Publication Date: Nov 09, 2024
Source ID: N000142412643

Entities

People

Sanjeev Arora

Organizations

Office of Naval Research
Trustees of Princeton University
United States Navy

Mathematical Understanding of Large Language Models

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers