Research and Application of Representational Learning for Semantic Text Analysis

Abstract

Evidence of WMD events is often hidden in plain sight. Finding this evidence within the tremendous amount of available textual information requires machine processing, but current algorithms have unacceptably low accuracy rates. This is because manually created lexical resources can inform the process, but fail in light of the “long tail” of language terms that will never be annotated. This work will produce a new method of detecting events in text by combining advances in representational learning with a proof-of-concept text processing system. To do this, we will use deep learning to induce latent semantic representations of words, entities, and frames utilizing semantic structures, such as FrameNet and Abstract Meaning Representation (AMR). This application will require solving four major technical problems: (1) inventing a representation and algorithm for propagating meaning through semantic structures, and for parsing unlabeled text to the optimal semantic structure; (2) inventing new machine learning techniques for estimating the parameters of these very high-dimensional models without overfitting; (3) moving beyond the word level to leverage word-internal morphological structure, thereby improving processing of rare words whose distributional statistics may be insufficient; (4) inventing an algorithm to semi-automatically expand the frame ontology by interactively clustering potential predicates. The resulting algorithm will be incorporated into a simplified user interface capable of ingesting raw text and producing readable output tailored to the needs of the national security community.

Document Details

Document Type: DoD Grant Award
Publication Date: May 26, 2016
Source ID: HDTRA11510019

Entities

People

Bryan Lee

Organizations

Defense Threat Reduction Agency
Middlebury College

Research and Application of Representational Learning for Semantic Text Analysis

Abstract

Document Details

Entities

People

Organizations

Tags

Readers

Technology Areas