Generating Documents that are Consistent with a Knowledge Base
Abstract
Generating Documents that are Consistent with a Knowledge BasePROJECT SUMMARYResearch Problem. There is a growing need for the US government and for US companies to understand the content of the documents within their own enterprises so that they can better protect the military secrets, intellectual property, personal information, financial information, and business plans contained therein. Such data within organizations is scattered within reports, presentations, spreadsheets, and more. The goal of this proposalis to develop the computational and mathematical foundations required to automatically understand technical documents and use an algebra of document manipulation operators to generate a collection of new documents that are consistent with the history, but is intended to mislead and inflict costs on the attacker.Technical Approaches and Anticipated Outcomes. Technical documents arefundamentally different from many other types of documents in that they containspecialized jargon, charts, graphs, formulas, images and diagrams. We propose a single, novel, theoretical framework called Probabilistic Logic Graphs (PLGs) which are capable of representing the knowledge contained in these diverse entities within technical documents. Our first task is to define PLGs in a rigorous, formal manner and to show that PLGs can express these kinds of complex entities. A second challenge we will address is that of automatically extracting these PLG structures from a document, creating a structured knowledge representation of unstructured data with diverse elements. These twochallenges deal with analyzing document content. Our next challenge deals with how documents evolve over time ??? it is common for the same document to have many timestamped versions in a given enterprise network. Our fourth challenge deals with the problem of quantifying the consistency of a given document with a set of existing timestamped versions of a document. We propose a formal model of consistency score based on mixed integer programming. A related challenge looks at the problem of generating a new document that is consistent with a given history of document versions, while at the same time satisfying some goals articulated by the system security manager, and takingbackground knowledge of the domain into account (while preserving inconsistencies in a document version history). A fifth challenge looks at understanding the psychological signals (e.g. EEG, FMRI) that link believability and anomaly formation in the minds of humans. Finally, using the results of all of these challenges as input, we propose a killer app. We propose that enterprises automatically generate fake versions of documents so thata malicious hacker who penetrates that network has difficulty separating fact from fiction. DoD Impact. Exfiltration of data is a serious threat to DoD. Deploying our methods and technology will impose significantly higher costs on the attacker.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Jul 27, 2018
- Source ID
- N000141812670
Entities
People
- Sushil Jajodia
Organizations
- George Mason University
- Office of Naval Research
- United States Navy