A Conditional Random Field for Discriminatively-Trained Finite-State String Edit Distance

Abstract

The need to measure sequence similarity arises in information extraction, object identity, data mining, biological sequence analysis, and other domains. This paper presents discriminative string-edit CRF's a finite-state conditional random field model for edit sequences between strings. Conditional random fields have advantages over generative approaches to this problem, such as pair HMMs or the work of Ristad and Yianilos, because as conditionally-trained methods, they enable the use of complex, arbitrary actions and features of the input strings. As in generative models, the training data does not have to specify the edit sequences between the given string pairs. Unlike generative models, however, our model is trained on both positive and negative instances of string pairs. We present positive experimental results on several data sets.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2005
Accession Number
ADA440386

Entities

People

  • Andrew McCallum
  • Kedar Bellare
  • Фернандо Лобо Перейра

Organizations

  • University of Massachusetts Amherst

Tags

Communities of Interest

  • Autonomy
  • Energy and Power Technologies
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Artificial Intelligence Software
  • Computational Science
  • Computer Science
  • Computer Vision
  • Data Mining
  • Data Sets
  • Generative Models
  • Information Processing
  • Information Science
  • Information Systems
  • Language
  • Machine Learning
  • Markov Models
  • Probability
  • Recognition

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Computer Programming and Software Development.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks