A Distributed Laboratory for Automated Document Generation Using Large-Scale Computational Methods

Abstract

Under the ONR funded BAA project entitled Generating Documents that are Consistent with a Knowledge Base, George Mason University and Dartmouth College are jointly developing a system to generate fake documents that are consistent with background knowledge bases. We are developing a suite of algorithms to automatically extract knowledge bases from real documents and synthesize realistic fake documents to deter potential cyber-adversaries. This requires an end-to-end solution across many different fields, such as natural language processing, computer vision, logic, and optimization because technical documents are multimodal. In addition, corporations have huge numbers of documents. To support these large-scale computational methods our systemrequires a huge number of computing resources in terms of CPUs, memory, and GPUs.We have used existing resources at George Mason and Dartmouth; however, our experiments have been severely limited a single neural network based method such as Generative Adversarial Networks may take weeks to run, causing the research to proceed very slowly, especially as multiple runs are needed to process different parameters. We propose the development of a distributed laboratory (at GMU and Dartmouth) for large-scale documentgeneration consistent with a knowledge base. The proposal addresses only the equipment cost of machines and racks that will constitute our distributed computing system. The laboratory will enable us to design and test various methods to achieve this goal. Additionally, it will offer our students the opportunity to gain precious hands-on experience.

Document Details

Document Type
DoD Grant Award
Publication Date
May 08, 2020
Source ID
N000142012407

Entities

People

  • Sushil Jajodia

Organizations

  • George Mason University
  • Office of Naval Research
  • United States Navy

Tags

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Parallel and Distributed Computing.
  • Research Science/Academic Research

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Neural Networks
  • Cyber