Beyond Worst-Case Analysis in Privacy and Clustering: Exploiting Explicit and Implicit Assumptions

Abstract

This thesis is a collection of work in differential privacy and clustering. In part one, we discuss work aimed at preserving differential privacy in a social network, with respect to either the presence/absence of a single edge, or with respect to changing all edges adjacent to one node. In part two, we discuss multiple clustering problems, focusing on the k-means and k-median problems. We show how to correctly cluster an instance whose optimal k-means solution either satisfies a certain stability condition, or is resilient to small constant metric perturbations, or with cluster centers that satisfy a particular separation condition. Alternatively, this thesis can be viewed as an investigation of specific non-worst case analysis paradigms. The common theme in the results of this thesis is that they all introduce algorithms whose guarantees are meaningful only for a subset of inputs -- for inputs that satisfy certain nice properties, or assumptions. These assumptions can be roughly divided into two types, which we refer to as explicit or implicit. Explicit assumptions give a very specific and quantifiable characterization of the input (e.g., clustering instances with the distance between any pair of cluster centers larger than a specific bound). On the other hand, implicit assumptions are harder to characterize. Implicit assumptions pose a certain property that the input should satisfy, due to some compelling "real-life" reasoning (e.g., justifying a particular value of k for a k-means clustering instance), and often give much leeway as to the particular structure of the input. In this thesis, we exhibit multiple examples of assumptions of both kinds, in differential privacy and clustering, and give algorithms that take advantage of these assumptions. In particular, we show how tasks that are provably hard become feasible under suitable assumptions -- tasks like providing accurate answers for queries over graph while preserving privacy on the node level.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 01, 2013
Accession Number
ADA582868

Entities

People

  • Or Sheffet

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • C4I
  • Engineered Resilient Systems

DTIC Thesaurus Topics

  • Algorithms
  • Computer Science
  • Data Analysis
  • Data Mining
  • Data Sets
  • Databases
  • Estimators
  • Information Retrieval
  • Information Science
  • Linear Algebra
  • Mathematics
  • Network Science
  • Random Variables
  • Social Networks
  • Stability Conditions
  • Statistical Sampling
  • Theorems

Fields of Study

  • Computer science

Readers

  • Calculus or Mathematical Analysis
  • Graph Algorithms and Convex Optimization.
  • Operations Research