Adding Numbers to Text Classification

Abstract

Many real-world problems involve a combination of both text- and numerical-valued features. For example, in email classification, it is possible to use instance representations that consider not only the text of each message, but also numerical-valued features such as the length of the message or the time of day at which it was sent. Text-classification methods have thus far not easily incorporated numerical features. In earlier work we described an approach for converting numerical features into bags of tokens so that text classification methods can be applied to numerical classification problems, and showed that the resulting learning methods are competitive with traditional numerical classification methods. In this paper we use this as a way to learn on problems that involve a combination of text and numbers. We show that the results outperform competing methods. Further, we show that selecting a best classification method using text-only features and then adding numerical features to the problem (as might happen if numerical features are only later added to a pre-existing text-classification problem) gives performance that rivals a more time-consuming approach of evaluating all classification methods using the full set of both text and numerical features.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2003
Accession Number
ADA453817

Entities

People

  • Hayn Hirsh
  • Sofus A. Macskassy

Organizations

  • New York University

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Accuracy
  • Air Force
  • Air Force Research Laboratories
  • Classification
  • Coding
  • Computer Programming
  • Computer Science
  • Data Sets
  • Economic Development
  • Governments
  • Hard Copy
  • Information Retrieval
  • Information Science
  • Machine Learning
  • Neural Networks
  • New York
  • Reinforcement Learning

Fields of Study

  • Computer science

Readers

  • Finite Element Method (FEM) for solving Partial Differential Equations (PDEs)
  • Neural Network Machine Learning.
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Machine Translation