Content Locality in Distributed Digital Libraries

Abstract

This paper introduces the notion of content locality in distributed document collections. Content locality is the degree to which content-similar documents are colocated in a distributed collection. We propose two metrics for measurement of content locality, one based on topic signatures and the other based on collection statistics. We provide derivations and analysis of both metrics and use them to measure the content locality in two kinds of document collections, the well-known TREC corpus and the Networked Computer Science Technical Report Library (NCSTRL), an operational digital library. We also show that content locality can be thought of temporally as well as spatially and provide evidence of its existence in temporally ordered document collections like news feeds.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1999
Accession Number
ADA457074

Entities

People

  • Charles L. Viles
  • James C. French

Organizations

  • University of North Carolina at Chapel Hill

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Abstracts
  • Commerce
  • Communication Systems
  • Computer Science
  • Computers
  • Databases
  • Electronic Mail
  • Engineering
  • Event Detection
  • Information Processing
  • Information Retrieval
  • Information Science
  • Libraries
  • Library Science
  • Measurement
  • Network Science
  • Statistics

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Library and Information Science
  • Theoretical Analysis.