Multi-Use Text Analytics

Abstract

How does one effectively explore and extract insights from large collections of highly heterogeneous documents across a diverse set of use cases? The Institute for Defense Analyses (IDA), a nonprofit corporation operating three federally funded research and development centers, helps government sponsors answer challenging questions that lie at the intersection of national security, science, technology, and policy. These questions often involve making sense of large numbers of documents and files. Moreover, factors important to one question may not be important for others. Most existing solutions to document analysis and review employ a one-size-fits-all approach that cannot be easily adapted to different use cases. IDA has developed in-house text analytics capabilities to facilitate search, exploration, and analysis of large collections of files in ways that enable greater flexibility in answering research questions. An extensible set of machine learning and natural language processing methods can be targeted and applied in real time to subsets of documents of interest through a no-code, point-and-click, web-based user interface, which allows for more focused insight extractions. In addition, programmatic APIs are locally and remotely accessible, which allow users to customize document processing and analyses for a wide range of different use cases without requiring source code modifications. For instance, researchers have previously used the API to ingest documents from diverse data sources and train custom machine learning models to auto-label documents to make them findable or filterable. This presentation provides a high-level overview of the work with illustrative examples.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Nov 09, 2023
Accession Number: AD1226614

Entities

People

Arun S. Maiya
Dale Visser
Margaret E. Zientek

Organizations

Institute for Defense Analyses

Multi-Use Text Analytics

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas