Harvest User's Manual

Abstract

HARVEST is an information discovery and access system [4]. It addresses three critical problems to help users reap the growing collection of information accessible via the World Wide Web [2]. First, it provides an efficient and flexible means of indexing widely distributed information, to support resource discovery. Second, it provides network-adaptive means of caching and replicating heavily accessed information, to prevent bottlenecks. Third, it provides support for accessing and manipulating complex data. A key goal of Harvest is to provide a flexible system that can be configured in various ways to create many types of indexes, making very efficient use of Internet servers, network links, and index space on disk. Our measurements indicate that Harvest can reduce server load by a factor of 6,600, network traffic by a factor of 59, and index space requirements by a factor of 43 when building indexes, compared with previous systems, such as Archie, WAIS, and the World Wide Web Worm [3]. Harvest also allows users to extract structured (attribute-value pair) information from many different information formats and build indexes that allow these attributes to be referenced (e.g., all documents with a certain regular expression in the title field).

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 01, 1994
Accession Number
ADA461230

Entities

People

  • Darren R. Hardy
  • Michael F. Schwartz

Organizations

  • University of Colorado Boulder

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Availability
  • Classification
  • Colorado
  • Computers
  • Computing Devices
  • Contracts
  • Information Operations
  • Instructions
  • Internet
  • Lepidoptera
  • Measurement
  • Monitoring
  • Networks
  • Security
  • World Wide Web

Fields of Study

  • Computer science

Readers

  • Energy Conservation and Renewable Energy Engineering.
  • Library and Information Science
  • Parallel and Distributed Computing.

Technology Areas

  • Space