Automatic Text Categorization Applied to E-Mail

Abstract

The author developed an automatic text categorization approach and investigated its application upon categorizing emails. The categorization approach is derived from an instanced- based learning method that explores conditional probabilities of particular words. The effectiveness of the author's categorization approach using collections from a set of emails is then evaluated and assigned a numerical score based upon precision and recall. Precision was 65% while recall was 17%. The author's experiments indicated automatic categorization of incoming emails at the client level can categorize email, but is difficult when not using a standardized corpus. Word frequency is valuable, but should be used in combination with other methods such as phrase extraction for a higher level of performance.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 2002
Accession Number
ADA406989

Entities

People

  • Scott R. Hall

Organizations

  • Naval Postgraduate School

Tags

Communities of Interest

  • Human Systems
  • Weapons Technologies

DTIC Thesaurus Topics

  • Automatic
  • Computer Science
  • Computers
  • Data Mining
  • Department Of Defense
  • Electronic Mail
  • Frequency
  • Information Processing
  • Information Science
  • Machine Learning
  • Mainframe Computers
  • Marine Corps
  • Network Science
  • Precision
  • Probabilistic Models
  • Probability
  • Test Sets

Readers

  • Computational Linguistics
  • Computational Modeling and Simulation
  • Database Systems and Applications