Mining a Large-Scale Term-Concept Network from Wikipedia

Abstract

Social tagging and information retrieval are challenged by the fact that the same item or idea can be expressed by different terms or words. To counteract the problem of variable terminology, researchers have proposed concept-based information retrieval. To date, however, most concept spaces have been either manually-produced taxonomies or special-purpose ontologies, too small for classifying arbitrary resources. To create a large set of concepts, and to facilitate terms to concept mapping, we introduce mine a network of concepts and terms from Wikipedia. Our algorithm results in a robust, extensible term-concept network for tagging and information retrieval, containing over 2,000,000 concepts with mappings to over 3,000,000 unique terms.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2005
Accession Number
AD1106851

Entities

People

  • Andrew Gregorowicz
  • Mark A. Kramer

Organizations

  • MITRE Corporation

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Computer Programming
  • Computers
  • Data Mining
  • Databases
  • Dictionaries
  • English Language
  • Extraction
  • Information Retrieval
  • Knowledge Management
  • Language
  • Models
  • Natural Languages
  • Ontologies
  • Programming Languages
  • Relational Databases
  • Semantic Models
  • Taxonomy
  • Text Mining
  • User Interface

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Information Retrieval
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • Space