Visualizing Distributed System Executions

Abstract

Distributed systems pose unique challenges for software developers. Understanding the system’s communication topology and reasoning about concurrent activities of system hosts can be difficult. The standard approach, analyzing system logs, can be a tedious and complex process that involves reconstructing a system log from multiple hosts’ logs, reconciling timestamps among hosts with non-synchronized clocks, and understanding what took place during the execution encoded by the log. This article presents a novel approach for tackling three tasks frequently performed during analysis of distributed system executions: (1) understanding the relative ordering of events, (2) searching for specific patterns of interaction between hosts, and (3) identifying structural similarities and differences between pairs of executions. Our approach consists of XVector , which instruments distributed systems to capture partial ordering information that encodes the happens-before relation between events, and ShiViz , which processes the resulting logs and presents distributed system executions as interactive time-space diagrams. Two user studies with a total of 109 students and a case study with 2 developers showed that our method was effective, helping participants answer statistically significantly more system-comprehension questions correctly, with a very large effect size.

Document Details

Document Type
Pub Defense Publication
Publication Date
Mar 04, 2020
Source ID
10.1145/3375633

Entities

People

  • Albert Xing
  • Ivan Beschastnikh
  • Michael D. Ernst
  • Patty Wang
  • Perry Liu
  • Yuriy Brun

Organizations

  • Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada
  • National Science Foundation
  • United States Air Force
  • University of British Columbia
  • University of Massachusetts
  • University of Washington

Tags

Fields of Study

  • Computer science
  • Engineering

Readers

  • Parallel and Distributed Computing.
  • Regression Analysis.
  • Systems Analysis and Design

Technology Areas

  • Space