Harvest: A Scalable, Customizable Discovery and Access System

Abstract

Rapid growth in data volume user base and data diversity render Internet-accessible information increasingly difficult to use effectively. In this paper we introduce Harvest, a system that provides a set of customizable tools for gathering information from diverse repositories, building topic-specific content indexes, flexibly searching the indexes, widely replicating them, and caching objects as they are retrieved across the Internet. The system interoperates with Mosaic and with HTTP, FTP, and Gopher information resources. We discuss the design and implementation of each subsystem and provide measurements indicating that Harvest can reduce server load, network traffic and index space requirements significantly compared with previous indexing systems. We also discuss a half dozen indexes we have built using Harvest, underscoring both the customizability and scalability of the system.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 01, 1994
Accession Number
ADA461844

Entities

People

  • C. M. Bowman
  • Darren R. Hardy
  • Michael F. Schwartz
  • Peter B. Danzig
  • Udi Manber

Organizations

  • University of Colorado Boulder

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Availability
  • California
  • Classification
  • Colorado
  • Computers
  • Contracts
  • Cooperation
  • Information Operations
  • Instructions
  • Internet
  • Measurement
  • Monitoring
  • Networks
  • Scalability
  • Security
  • Universities

Fields of Study

  • Computer science

Readers

  • Computer Networking
  • Library and Information Science/ Studies, Southeast Asia Studies, Bibliography of Vietnam and Lao Studies.
  • Software Engineering.

Technology Areas

  • Space