Analyzing Extremely Large-scale Recurrent Event Data: a Divide-and-Conquer Approach

Abstract

The fast-paced technology-oriented warfare involves large data collection, processing, and modeling. In many emerging military applications, data are continuously accumulated in extremely large scales to facilitate tactical applications. Moreover, data structures are becoming more complicated. Among them, data with the same outcome occurring repeatedly over a long period of time are often collected. This type of data is referred as recurrent event data, which becomes common in continuous monitoring of soldiers, officers, and the surrounding tactical environments. Analyzing recurrent data in large quantity presents many challenges such as memory limitations, un-scalable computing time, etc. To get around these problems, we usually only analyze a random sample of the data. This simple approach will not only waste information also can lead to incorrect and biased results. In this research, we propose a divide-and-conquer approach using parametric frailty models to analyze large scale recurrent event data which might be too large to load into a single computer. Specifically, we will randomly divide the data into many subsets and analyze each subset separately. Then we will use a weighted aggregation method to combine these models into a final one. We will show that this divide-and-conquer approach can achieve an equivalent result as the one from using the complete data (if we could have done so). We will conduct simulation studies to demonstrate the performance of this proposed method and verify some important properties of the estimators from the proposed combined frailty models. This proposed method is generic in nature and can be extended to other types of data. To show the feasibility of this approach, we plan to build a high performance computing platform leveraging model high-speed computing technologies such as parallel computing with graphic processing units, cloud computing platforms such as Spark and Hadoop, etc. With this project, we will make contributions to areas of big data analytic and modeling as well as high performance computing, which are the fundamental components in modern intelligent systems for military applications. The abstract is publicly releasable.

Document Details

Document Type
DoD Grant Award
Publication Date
Jul 24, 2019
Source ID
W911NF1910405

Entities

People

  • Jerry Cheng

Organizations

  • Army Contracting Command
  • Rutgers University
  • United States Army

Tags

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Regression Analysis.