Interactive Information Organization: Techniques and Evaluation

Abstract

The explosive growth of digital information available on-line and the ubiquity of the Internet require the development of effective techniques for information search and access. Locating interesting information on the World Wide Web is the main task of on-line search engines. Such engines accept a query from a user and respond with a list of documents or web pages that are considered to be relevant to the query. The pages are ranked by their likelihood of being relevant to the user's request. The majority of today's Web search engines follow this scenario. The ordering of documents in the ranked list is simple and intuitive. The user is expected to follow the list while examining the retrieved documents. In practice, browsing the ranked list is rather tedious and often unproductive. Existing evidence shows that users quite often stop and do not venture beyond the first screen of results or the top 10 retrieved documents. In this thesis, the author studies alternative document organization techniques that can help users find relevant information in the retrieved data much more quickly than with a ranked list. He introduces a novel evaluation approach that is based in part on modeling system-user interaction. It allows one to separate the user's effect on the overall performance from the system's qualities. He applies this evaluation method to two different document organization techniques. The first technique uses a clustering algorithm to partition the document set into well-defined groups. The second system applies a multidimensional scaling algorithm called spring-embedding to represent documents as objects in space arranged in proportion to inter-document similarity. The results show that both systems can be used much more effectively than the ranked list approach. The author uses a reinforcement learning algorithm to build a "wizard" tool that helps the user navigate the system. This wizard provides better support than traditional relevance feedback.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 2001
Accession Number
ADA441132

Entities

People

  • Anton Leuski

Organizations

  • University of Massachusetts Amherst

Tags

Communities of Interest

  • Air Platforms
  • Energy and Power Technologies
  • Ground and Sea Platforms
  • Weapons Technologies

DTIC Thesaurus Topics

  • Commerce
  • Computational Science
  • Data Sets
  • Databases
  • Electric Automobiles
  • Health Services
  • Information Retrieval
  • Information Science
  • Machine Learning
  • Medical Personnel
  • Network Science
  • Neural Networks
  • Organizational Structure
  • Reinforcement Learning
  • Strategic Defense Initiative
  • Three Dimensional
  • Two Dimensional

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Database Systems and Applications
  • Information Retrieval

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • Space