Some Issues in the Automatic Classification of US Patents,

Abstract

The classification of US patents poses some special problems due to the enormous size of the corpus, the size and complex hierarchical structure of the classification system, and the size and structure of patent documents. The representation of the complex structure of documents has not been a standard area of research in text categorization, but we have found it to be an important factor in our previous work on classifying patient medical records (Larkey and Croft, 1996) and in our current work on US patents. Our classification approach is to combine the results of k-nearest-neighbor classifiers with those of Bayesian classifiers. The k-nearest-neighbor classifier allows us to represent the document structure using the query operators in the Inquery information retrieval system. The Bayesian classifiers can use the hierarchical relations among patent subclasses to select closely related negative examples to train more discriminating classifiers.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1997
Accession Number
ADA341100

Entities

People

  • Leah S. Larkey

Organizations

  • University of Massachusetts Amherst

Tags

Communities of Interest

  • Biomedical

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Artificial Intelligence Software
  • Bayesian Networks
  • Classification
  • Computer Science
  • Data Sets
  • Hierarchies
  • Information Retrieval
  • Inventions
  • Machine Learning
  • Models
  • Patent Applications
  • Patents
  • Probabilistic Models
  • Training
  • United States

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Regression Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval