Multi-Use Text Analytics
Abstract
How does one effectively explore and extract insights from large collections of highly heterogeneous documents across a diverse set of use cases? The Institute for Defense Analyses (IDA), a nonprofit corporation operating three federally funded research and development centers, helps government sponsors answer challenging questions that lie at the intersection of national security, science, technology, and policy. These questions often involve making sense of large numbers of documents and files. Moreover, factors important to one question may not be important for others. Most existing solutions to document analysis and review employ a one-size-fits-all approach that cannot be easily adapted to different use cases. IDA has developed in-house text analytics capabilities to facilitate search, exploration, and analysis of large collections of files in ways that enable greater flexibility in answering research questions. An extensible set of machine learning and natural language processing methods can be targeted and applied in real time to subsets of documents of interest through a no-code, point-and-click, web-based user interface, which allows for more focused insight extractions. In addition, programmatic APIs are locally and remotely accessible, which allow users to customize document processing and analyses for a wide range of different use cases without requiring source code modifications. For instance, researchers have previously used the API to ingest documents from diverse data sources and train custom machine learning models to auto-label documents to make them findable or filterable. This presentation provides a high-level overview of the work with illustrative examples.
Document Details
- Document Type
- Technical Report
- Publication Date
- Nov 09, 2023
- Accession Number
- AD1226614
Entities
People
- Arun S. Maiya
- Dale Visser
- Margaret E. Zientek
Organizations
- Institute for Defense Analyses