BRAT: A Random Walk through the Semantic Spaces of the Blogosphere

Abstract

Semantic spaces, such as the Latent Semantic Analysis (LSA), Hyperspace Analog to Language (HAL) or Random Indexing (RI), offer convenient methods to represent semantic relations between words and concepts, abstracted from a distribution of documents. The distribution of documents determines the local co-occurrence pattern between words all over the corpus and, then, determines the semantic abstracted from the local distribution. Such methods are sensitive to the statistical properties on the distribution of words over documents. For instance, the semantic on the word table abstracted from a scientific corpus or a general corpus may be different. In the first case, since table may occur in the context of table of correlation or table of results, it would be considered to be associated to the word correlation whereas in the second case, because it may co-occur with kitchen or living-room, it would rather be considered as similar to chair. Nevertheless, the formal relation bearing the properties of the distribution of word's co-occurrence and the final semantic produced by Semantic space methods have not been described until now. In the case of a mixed "scientific and general" corpus, what makes that the semantic of table became more similar to chair than Speerman and vice-versa?

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2009
Accession Number
ADA517876

Entities

People

  • Adil El Ghali
  • Yann V. Hoareau

Organizations

  • Paris 8 University

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Cognitive Science
  • Computational Complexity
  • Computational Linguistics
  • Identities
  • Information Retrieval
  • Information Science
  • Language
  • Linguistics
  • Models
  • Natural Language Processing
  • New York
  • Online Communications
  • Random Walk
  • Standards
  • Supervised Machine Learning

Readers

  • Computational Linguistics
  • Military History
  • Regression Analysis.

Technology Areas

  • Space