Scalable Diagnostics and Data Compression for Global Atmospheric Chemistry using Ristretto Library (version 1.0)

Abstract

Abstract. We introduce a new set of algorithmic tools capable of producing scalable, low-rank decompositions of global spatio-temporal atmospheric chemistry data. By exploiting emerging randomized linear algebra algorithms, a suite of decompositions are proposed that extract the dominant features from big data sets (i.e. global atmospheric chemistry at longitude, latitude and elevation) with improved interpretability. Importantly, our proposed algorithms scale with the intrinsic rank of the global chemistry space rather than the ever increasing spatio-temporal measurement space, thus allowing for efficient representation and compression of the data. In addition to scalability, two additional innovations are proposed for improved interpretability: (i) a non-negative decomposition of the data for improved interpretability by constraining the chemical space to have only positive expression values (unlike PCA analysis), and (ii) sparse matrix decompositions, which thresholds low-correlations to zero, thus highlighting the dominant, localized spatial activity (again unlike PCA analysis). Our methods are demonstrated on a full year of global chemistry dynamics data, showing its significant improvement in computational speed and interpretability. We show that the here presented decomposition methods successfully extract known major features of atmospheric chemistry, such as summertime surface pollution and biomass burning activities. Indeed, we find that the full annual model output can be reconstructed using only 50–100 principal modes, suggesting that the presented methods offer the potential to archive model data of atmospheric chemistry with compression factors in the range of 200–4000 or greater. In the emerging area of big data, specifically global chemistry monitoring, such technologies are critically enabling for real-time and computationally tractable diagnostics of both large scale simulation and measurement data.

Document Details

Document Type
Pub Defense Publication
Publication Date
Dec 19, 2018
Source ID
10.5194/gmd-2018-308

Entities

People

  • Christoph A. Keller
  • J. Nathan Kutz
  • Meghana Velagar
  • N Benjamin Erichson

Organizations

  • Air Force Office of Scientific Research

Tags

Readers

  • Combustion science or combustion engineering.
  • Distributed Systems and Data Platform Development
  • Image Processing and Computer Vision.

Technology Areas

  • Space