Harvest User's Manual

Abstract

HARVEST is an information discovery and access system [4]. It addresses three critical problems to help users reap the growing collection of information accessible via the World Wide Web [2]. First, it provides an efficient and flexible means of indexing widely distributed information, to support resource discovery. Second, it provides network-adaptive means of caching and replicating heavily accessed information, to prevent bottlenecks. Third, it provides support for accessing and manipulating complex data. A key goal of Harvest is to provide a flexible system that can be configured in various ways to create many types of indexes, making very efficient use of Internet servers, network links, and index space on disk. Our measurements indicate that Harvest can reduce server load by a factor of 6,600, network traffic by a factor of 59, and index space requirements by a factor of 43 when building indexes, compared with previous systems, such as Archie, WAIS, and the World Wide Web Worm [3]. Harvest also allows users to extract structured (attribute-value pair) information from many different information formats and build indexes that allow these attributes to be referenced (e.g., all documents with a certain regular expression in the title field).

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Oct 01, 1994
Accession Number: ADA461230

Entities

People

Darren R. Hardy
Michael F. Schwartz

Organizations

University of Colorado Boulder

Harvest User's Manual

Abstract

Document Details

Entities

People

Organizations

Tags

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas