Efficient Training Methods for Conditional Random Fields

Abstract

Many applications require predicting not a just a single variable, but multiple variables that depend on each other. Recent attention has therefore focused on structured prediction methods. Especially popular have been conditional random fields (CRFs), which are graphical models of the conditional distribution over outputs given a set of observed features. Unfortunately, parameter estimation in CRFs requires repeated inference. Complex graphical structures are increasingly desired in practical applications, but training time often becomes prohibitive. In this thesis, I investigate efficient training methods for conditional random fields with complex graphical structure, focusing on local methods which avoid propagating information globally along the graph. First, I investigate piecewise training, which trains each of a model's factors separately. I present three views of piecewise training: as maximizing the likelihood in a so-called "node-split graph", as maximizing the Bethe likelihood with uniform messages, and as generalizing the pseudo-moment matching estimator of Wainwright, Jaakkola, and Willsky. Second, I propose piecewise pseudolikelihood, a hybrid procedure which "pseudolikelihood-izes" the piecewise likelihood, and is therefore more efficient if the variables have large cardinality. Finally, I explore training methods using beliefs arising from stopping BP before convergence. I propose a new schedule for message propagation and present suggestive results applying dynamic schedules to the system of equations that combine inference and learning. I also present two novel families of loopy CRFs, which appear as test cases throughout. First is the dynamic CRF, which combines the factorized state representation of dynamic Bayesian networks with the modeling flexibility of conditional models. The second of these is the skip-chain CRF, which models the fact that identical words are likely to have the same label, even if they occur far apart.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 01, 2008
Accession Number
ADA477513

Entities

People

  • Charles A. Sutton

Organizations

  • University of Massachusetts Amherst

Tags

Communities of Interest

  • C4I
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Artificial Intelligence Software
  • Bayesian Networks
  • Computational Science
  • Computer Languages
  • Data Mining
  • Hidden Markov Models
  • Information Science
  • Machine Learning
  • Markov Models
  • Natural Language Processing
  • Network Science
  • Neural Networks
  • Probabilistic Models
  • Probability
  • Probability Distributions
  • Random Variables
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Operations Research
  • Statistical inference.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Neural Networks