Using Context to Assist in Personal File Retrieval

Abstract

Personal data is growing at ever-increasing rates, fueled by a growing market for personal computing solutions and the dramatic growth of available storage space on these platforms. Users, no longer limited in what they can store, are now faced with the problem of organizing their data so that they can find it again later. Unfortunately, as data sets grow, the complexity of organizing these sets also grows. This problem has driven a sudden growth in search tools aimed at the personal computing space, designed to assist users in locating data within their disorganized file space. Despite the sudden growth in this area, local file search tools are often inaccurate. These inaccuracies have been a long-standing problem for file data, as evidenced by the downfall of attribute-based naming systems that often relied on content analysis to provide meaningful attributes to files for automated organization. While file search tools have lagged behind, search tools designed for the world wide web have found wide-spread acclaim. Interestingly, despite significant increases in non-textual data on the web (e.g., images, movies), web search tools continue to be effective. This is because the web contains key information that is currently unavailable within file systems: "context." By capturing context information (e.g., the links describing how data on the web is inter-related), web search tools can significantly improve the quality of search over content analysis techniques alone. This work describes Connections, a context-enhanced search tool that utilizes temporal locality among file accesses to provide inter-file relationships to the local file system. Once identified, these inter-file relationships provide context information, similar to that available in the world wide web. Connections leverages this context to improve the quality of file search results. User studies with Connections see improvements in both precision and recall over content-only search.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 25, 2006
Accession Number
ADA490204

Entities

People

  • Craig A. Soules

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Autonomy
  • Ground and Sea Platforms

DTIC Thesaurus Topics

  • Algorithms
  • Commerce
  • Computer Science
  • Computers
  • Data Mining
  • Data Sets
  • Information Retrieval
  • Information Science
  • Machine Learning
  • Network Science
  • Operating Systems
  • Personal Computers
  • Probability
  • User Interface
  • Websites
  • Word Processors
  • World Wide Web

Fields of Study

  • Computer science

Readers

  • Database Systems and Applications
  • Economics
  • Systems Analysis and Design

Technology Areas

  • Space