Discovering Models of Software Processes from Event-Based Data

Abstract

Many software process methods and tools presuppose the existence of a formal model of a process. Unfortunately, developing a formal model for an on-going, complex process can be difficult, costly, and error prone. This presents a practical barrier to the adoption of process technologies, which would be lowered by automated assistance in creating formal models. To this end, the authors have developed a data analysis technique that they term "process discovery." Under this technique, data describing process events are first captured from an on-going process and then used to generate a formal model of the behavior of that process. In this paper, the authors describe a Markov method that they developed specifically for process discovery. They also describe two additional methods that they adopted from other domains and augmented for their purposes. The three methods range from the purely algorithmic to the purely statistical. The approach underlying the methods is to view the process discovery problem as one of grammar inference. In other words, the data describing the behavior of a process are viewed as sentences in some language; the grammar of that language is then the formal model of the process. Following an introduction, Section 2 of the paper discusses the framework in which the authors define and analyze event data. Section 3 gives a more complete statement of the discovery problem and outlines their grammar inference approach. Section 4 provides needed background on grammar inference. The discovery methods themselves are described in Section 5. Section 6 presents a comparative evaluation of the methods. Section 7 describes DaGama, the tool implementing the discovery methods. The application of the methods in an industrial case study is reviewed in Section 8. In Section 9, the authors present a summary of their results, an overview of related work, and a discussion of future work.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 1996
Accession Number
ADA446147

Entities

People

  • Alexander L. Wolf
  • Jonathan E. Cook

Organizations

  • University of Colorado Boulder

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Case Studies
  • Computer Science
  • Computers
  • Data Analysis
  • Data Mining
  • Engineering
  • Information Science
  • Language
  • Markov Models
  • Neural Networks
  • Operating Systems
  • Probability
  • Reverse Engineering
  • Software Development
  • Standards

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Distributed Systems and Data Platform Development
  • Theoretical Analysis.

Technology Areas

  • AI & ML