Efficient Hierarchical Big Data Computing System

Abstract

Motivation: Data generation and consumption have been dominating a main part of people~s daily life especially with the pervasive av"ailability and usage of Internet technology and application. As big data and cloud computing are becoming increasingly popular, a ce"ntralized cloud is often inefficient when the data are produced at the edge of network or distributed on multiple sites. There are several issues for such a centralized infrastructure: (1) transferring massive data between a centralized cloud and edge devices requ"ires high bandwidth, which is usually not supported in the current Internet infrastructure; (2) users often would like to keep sensi"tive data local due to privacy and security; (3) centralized big data systems often have high latency for time-critical tasks due to" the contention of multi-tenants; (4) many users prefer to use a hybrid cloud that include local machines. Hence, an efficient hiera"rchical big data computing system becomes essential to support executing a big data application on multiple heterogeneous distribute"d cloud resources including edge cloudlet, aggregation cloud, and centralized cloud.Proposed Research: To facilitate the research o""f hierarchical big data computing system, we propose to purchase 16 computer nodes, two network switches, and two drones to provide" various experimental environments. The proposed research and development will include the following thrusts.~ Application Controll"er: The entire life cycle of the application is managed by an application controller, which queries cloudlets or data centers for th""eir current resource usage status, then deter-mines data placement, and schedules computations on cloud servers.~ Code Analysis Opt"imization: Current big data systems usually support comprehensive operations for distributed datasets. We design a principle of data" operational equivalence, which indicates what data operation sequences are equivalent. To decide how to optimize code performance,"" we propose a principle of cost evaluation, which helps find a more efficient way for data operations without changing semantics.~"" System Optimization: Besides static analysis, we monitor execution of instrumented programs and collect profiling information inclu"ding both monitored events and log information to conduct dynamic program analysis to help detect performance problems and then opti"mize the system and application performance.To evaluate our system, we will apply it to a real-world project, which is to develop a" real-time wildfire forecasting platform using Apache Spark to analyze streaming data. The wildfire related data are collected by un"manned aerial vehicles (UAVs), which work as edge devices. As the cloudlets on the field have low computing capability, only the pro""cessed data, for example, fire status after image recognition or video analysis, will be sent to the data centers. Since a fire mode""l contains several sub-models distributed to multiple aggregation cloud servers, our proposed system well supports for such a big da"ta project with streaming processing.Education: The proposed equipment will also be used to enhance education. It provides an essential platform to teach students hands-on experience on cyberinfrastructure and data analytics. We will design a virtualized lab platform including virtual machines and containers to facilitate an easy way for students to study and experience the cutting-edging cyb"erinfrastructure systems and data analytics tools. In addition, the PI will also develop undergraduate and graduate level courses in" the big data and cloud computing areas to educate next-generation IT engineers. The virtualized labs will be disseminated to the whole com-munity including other universities and be free for students to use.

Document Details

Document Type
DoD Grant Award
Publication Date
Jan 23, 2018
Source ID
N000141812121

Entities

People

  • Liqiang Wang

Organizations

  • Office of Naval Research
  • United States Navy
  • University of Central Florida Board of Trustees

Tags

Fields of Study

  • Computer science
  • Engineering

Readers

  • Distributed Systems and Data Platform Development

Technology Areas

  • AI & ML
  • Autonomy
  • Autonomy - Autonomous System Control