Automatic Requirements Specification Extraction from Natural Language (ARSENAL)

Abstract

Natural language (supplemented with diagrams and some mathematical notations) is convenient for succinct communication of technical descriptions between the various stakeholders (e.g., customers, designers, implementers) involved in the design of software systems. However, natural language descriptions can be informal, incomplete, imprecise and ambiguous, and cannot be processed easily by design and analysis tools. Formal languages, on the other hand, formulate design requirements in a precise and unambiguous mathematical notation, but are more difficult to master and use. We propose a methodology for connecting semi-formal requirements with formal descriptions through an intermediate representation. We have implemented this methodology in a research prototype called Automatic Requirements Specification Extraction from Natural Language (ARSENAL). The main novelty of ARSENAL lies in its ability to generate a fully-specified complete formal model automatically from natural language requirements. ARSENAL extracts relations from text using semantic parsing and progressively refines them over multiple stages to create a final composite model. Currently, ARSENAL generates formal models in linear-time temporal logic (LTL), but the approach can be adapted for other models, e.g., probabilistic relational models like Markov Logic Networks (MLN). The formal models of the requirements can be used to check important design and system properties, e.g., consistency, satisfiability, realizability. ARSENAL has a modular and flexible architecture that facilitates porting it from one domain to another. We evaluated ARSENAL on complex requirements from two real-world case studies: the Time-Triggered Ethernet (TTEthernet) communication platform used in space, and FAA-Isolette infant incubators used in NICU. We systematically evaluated various aspects of ARSENAL -- the accuracy of the natural language processing stage, the degree of automation, and robustness to noise.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 01, 2014
Accession Number
ADA611691

Entities

People

  • Daniel Elenius
  • Natarajan Shankar
  • Patrick Lincoln
  • Shalini Ghosh
  • Wenchao Li
  • Wilfrid Steiener

Organizations

  • SRI International

Tags

Communities of Interest

  • Cyber
  • Engineered Resilient Systems

DTIC Thesaurus Topics

  • Accuracy
  • Air Force
  • Case Studies
  • Computer Programming
  • Computer Science
  • Computers
  • Debugging
  • Formal Languages
  • Grammars
  • Language
  • Logic Gates
  • Machine Learning
  • Natural Language Processing
  • Notation
  • Software Development
  • Standards
  • Test And Evaluation

Fields of Study

  • Computer science
  • Engineering

Readers

  • Artificial Intelligence
  • Mathematical Modeling and Probability Theory.
  • Software Engineering.

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation
  • Space