Machine Learning for Information Extraction in Informal Domains

Abstract

Information extraction, the problem of generating structured summaries of human-oriented text documents, has been studied for over a decade now, but the primary emphasis has been on document collections characterized by well-formed prose (e.g., newswire articles). Solutions have often involved the hand-tuning of general natural language processing systems to a particular domain. However, such solutions may be difficult to apply to "informal" do- mains, domains based on genres characterized by syntactically unparsable text and frequent out-of-lexicon terms. With the growth of the Internet, such genres, which include email messages, newsgroup posts, and Web pages, are particularly abundant, and there is no lack of potential information extraction applications. Examples include a program to extract names from personal home pages, or a system that monitors newsgroups where computers are offered for sale in search of one that matches a user's specifications. This thesis asks whether it is possible to design general-purpose machine learning algorithms for such domains.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 1998
Accession Number
ADA360816

Entities

People

  • Dayne Freitag

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Autonomy
  • Energy and Power Technologies
  • Human Systems
  • Space

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automata Theory
  • Computational Science
  • Computer Languages
  • Computer Science
  • Data Mining
  • Electronic Mail
  • Information Science
  • Linguistics
  • Machine Learning
  • Markov Models
  • Named Entity Recognition
  • Natural Language Processing
  • Network Science
  • Pattern Recognition
  • Psychology

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Economics

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks