A Holistic Automatic Deep Understanding and Protection of Technical Documents

Abstract

Our research effort is based on the successful completion of several novel tasks towards the holistic deep understanding of technica"l documents. Thus, the tasks to be completed in this proposed effort are: TASK-1: MULTI-LEVEL INTEGRATION OF DIAGRAMS & NL-TEXT SEN"TENCES In a previous research work we have successfully integrated System-Diagrams and NL-Text Sentences in order to produce Enrich"ed System-Diagrams. Thus, in this proposed research task we will expand this integration between System-Diagrams and NL-Text Sentenc""es at multiple levels of hierarchical descriptions of the System Diagrams. In particular, a System-Diagram can be described in more"" details when some of its blocks (boxes) can further analyzed as new sub-System-Diagrams. Thus, we will conduct research to complete" this task by study a variety of System-Diagrams with different levels of sub-System-Diagrams. For instance 3 levels of description is sufficient for this study. TASK-2: IMAGES MODALITIES AND THEIR SPN GRAPH-MODELS In order to synthesize and integrate different" TD modalities, we have to express them into the same forms. As it is expected, different processes will be applied on the component"s of each of the two major parts (NL-Text and IMAGES) or modalities. The Image-Modalities will be converted into two different media (NL template text and Stochastic Petri-net models) for establishing a common ground between our two parts (NLP-text and IMAGES) cre"ating usefulknowledge for the successful completion of the project. In particular, each of these ~image-modalities will be expressed"" into a Stochastic Petri-Net (SPN) graph, which will represent the structural information and the ~actions~ or functionality existed" in that ~image-modality~. To produce the SPN graph of a modality we firstly need to extract the graph with attributes that describe" the structural features of that particular ~image-modality~. Then from that graph we can obtain the SPN graph. Thus, from the SPN g"raph forms of these ~image-modalities~ we can obtain the corresponding NL-text template sentences. These NL-text sentences will be provided to NLP-Text part to enrich the original NL-text by improving the technical document understanding. As a return the NL-part w"ill provide back enriched NL-Text SPN forms to be used for the association and synthesis of enriched SPN graphs, which will be used"" for the creation of the SPN System Diagram Simulator. Note that, the NL-text automatically extracted from ~images~ will be express"ed by a set of NL-text template sentences to represent the information contained in those images and to facilitate with an efficient" and effective interaction between the two main parts (text, images) of a technical document. Thus, we will extract and hierarchical""ly categorize its items into titles, subtitles, sections, paragraphs, captions, sentences and words. Then, in a bottom up way we att""empt to understand and associate words and sentences using the classical (agent ~~ action ~~ patient) AVP model. Notice that, some t""ext-words carry different weight and we consider them critical. These words are referred/associated to diagrams, tables, formulas, g""raphs, charts, and pictures. TASK-3: THE ENRICHMENT OF THE SYSTEM-DIAGRAM SIMULATOR The SPN System-Diagram Simulator represents on"e more unique feature of the research effort. From our current research effort (ONR-BAA-2015-2018) we will generate the first limited version of this simulator. The simulator is the software product of all the SPN forms/modalities of the technical document~s com"ponents. It is based on the general System-Diagram of the main idea presented in the technical document. Thus, all the other SPN mod"alities will contribute to the enrichment of the functionality of the system-Diagram. The SPN diagram simulator is important for kno"wing the functionality of a system. In addition, all the diagram components with a recognized functionality (like, the Arithmetic Lo""gic Unit (ALU))

Document Details

Document Type
DoD Grant Award
Publication Date
Feb 20, 2018
Source ID
N000141812144

Entities

People

  • Nikolaos Bourbakis

Organizations

  • Office of Naval Research
  • United States Navy
  • Wright State University

Tags

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Data Mining and Knowledge Discovery.
  • Instructional Design and Training Evaluation.