Boosting and Extending Event Extraction using Interaction Structures from Texts

Abstract

Event Extraction (EE) is an important and challenging task of information extraction (IE) in natural language processing (NLP) that aims to extract events and their arguments from text. Recent advances have featured deep learning models that use representation learning to achieve state-of-the-art performance for different EE tasks, called neural EE (NEE). Despite deep learningÕs steady progress to EE, the performance of current EE methods is not yet satisfactory, limiting their applications in different domains and languages. For instance, in the well-studied domain of English news articles, the state-of-the-art deep learning models for event argument extraction (i.e., a task in EE) have barely exceeded an F1 score of 60% on the widely-used ACE 2005 dataset. Moreover, current NEE models are often trained on closed settings of fixed type schema, domains, and languages, thereby hindering their ability to extract new types of information (e.g., event types) and perform well on new domains and languages. This project proposes to leverage text structures (e.g., sentence or document structures) to address the aforementioned limitations of current models for NEE. Here, we use text structures to refer to graphs that capture dependency relations between objects of interest in text (i.e., words, entities, events). Text structures can improve EE performance by facilitating the integration of various sources of information (i.e., syntax, semantics, background knowledge) to identify important context in text to aid the prediction tasks. Further, as text structures tend to be more invariant when we switch to new information types, domains, and languages, text structure-based models for EE can effectively transfer the knowledge and achieve better performance in such new settings. The project is the first to investigate text structures for NEE. The PI will focus on the following research tasks in this project: (1) Inducing effective sentence structures for NEE. To realize the sentence structures for NEE, this project explores different information sources to aid structure generation, including syntax, semantics, discourse, background knowledge, and their combinations. It also introduces novel structural regularization methods to exploit the generated structures for NEE; (2) Considering text structures beyond sentence boundaries. This project proposes to examine document structures for NEE where the nodes in the graph structures can encode the dependencies between objects (i.e., words, entities) in different sentences of the documents. In particular, this work will introduce and evaluate novel document structures for different EE tasks, considering within-document, cross-document, and cross-lingual document structures. The proposed document structures will enhance the ability to model long text sequences and directly encode important interactions between objects of interest in documents; and (3) Using text structures to extend NEE systems. To further demonstrate the advantages of text structures for NEE, this work explores their applications to novel settings for EE to improve the adaptability. The novelty comes from the employment of text structures to enable NEE models to effectively extend to new information types, domains and languages, characterizing few-shot learning for new type extension, and cross-domain/lingual transfer learning. This proposal has the potential to impact nearly all applications that benefit from EE/IE technologies in different domains and languages, especially for the settings/environments with low resources for EE. In particular, the proposal introduces a new generation of models for NEE, based on text structures to significantly boost the performance and portability. Among others, the proposed methods will be evaluated on two understudied domains (i.e., cybersecurity texts and historic newspapers) and cross-lingual settings wit h different languages to provide state-of-the-art EE technologies in those scenarios.

Document Details

Document Type
DoD Grant Award
Publication Date
Jun 25, 2021
Source ID
W911NF2110112

Entities

People

  • Thien Nguyen

Organizations

  • Army Contracting Command
  • United States Army
  • University of Oregon

Tags

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Housing Policy Studies in Military Families with Privatization and Telomerase Allowance Units, Multi-Family Housing, and Telomere Lengths.
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks
  • Cyber