Interactive Visualization Systems and Data Integration Methods for Supporting Discovery in Collections of Scientific Information

Abstract

Technological developments have been enabling additional sharing and reuse of scientific information. Current indexing methods support query-based search and filtering, however they do not support overviews and exploration. Due to these limitations of existing indexing methods, it is challenging to discover records and connections that relate information in new and potentially insightful ways. We developed prototype systems and computational methods for integrating collections from multiple sources within a domain into a single, unified graph data structure. Graph-theoretic measures and visualizations were then applied to identify relations and records that support discovery tasks. Three collections of molecular information were studied: (1) influenza protein sequences from the National Center for Biotechnology Information, (2) Open Notebook Science notebooks and databases from Drexel University and other academic chemical research laboratories, and (3) project data from drug discovery projects at Pfizer R&D. We designed methods for data integration within these collections. We then analyzed the integrated collections to design interactive visual tools and computational methods that could systematically identify relations and records that have a high potential to lead to novel discoveries in these areas. We conducted interviews with domain experts to evaluate the effectiveness of these designs. These studies demonstrate the feasibility of the new indexing methods to improve the discoverability of novel connections across multiple collections within a domain.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 2011
Accession Number
ADA546617

Entities

People

  • Donald A. Pellegrino Jr.

Organizations

  • Drexel University

Tags

Communities of Interest

  • Biomedical
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Chemistry
  • Computational Science
  • Computer Programming
  • Computer Programs
  • Computers
  • Data Analysis
  • Data Integration
  • Data Mining
  • Data Visualization
  • Databases
  • Information Processing
  • Information Science
  • Information Systems
  • Mobile Phones
  • Natural Language Processing
  • Operating Systems
  • Spreadsheet Software

Fields of Study

  • Computer science

Readers

  • Database Systems and Applications
  • Instructional Design and Training Evaluation.
  • Research Science/Academic Research

Technology Areas

  • Biotechnology