Mathematical Understanding of Large Language Models
Abstract
Language models are rapidly turning into general-purpose AI models, and this calls for new conceptual frameworks, new evaluations, and new training methods.The proposal sketches a framework for thinking about emergence of complex capabilities in language models via scaling laws, and proposes to extend it tohandle multiple modalities, and multi-turn interactions. It proposes that testing of AImodels should focus on compositionality as a key meta-skill that allowstesting of low-probability scenarios. Finally, it also sketches research directions involving approximate gradients that may help speed up training and fine-tuningand make it more feasible in academia.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Nov 09, 2024
- Source ID
- N000142412643
Entities
People
- Sanjeev Arora
Organizations
- Office of Naval Research
- Trustees of Princeton University
- United States Navy