Statistical Theory and Method of Large Tensor Data

Abstract

The rapid advance in modern scientific technology gives rise to a wide range of high-dimensional tensor data. For example, online advertising uses a particularInternet service to collect user information, establish and maintain relationships with consumers, and deliver advertising communications. Users click behavioron different advertisements from multiple publisher webpages forms a ???useradvertisement-publisher??? tensor, with its entry either zero (no click) or one (click) or missing. Another example is from neuroscience, where it is important to understand the relationship between brain images and the disease status toimprove early diagnostic accuracy of the disease. In such a study, 3D brain images collected over time and over subjects can be formulated as a fifth ordertensor with each entry of its first three modes containing measurement for its pixel, and its forth and fifth mode corresponding to time and subject, espectively. This proposal aims to develop statistical theory and method for modern tensor data frequently encountered in the above big data scenarios. A common difficulty in tensor algorithms is their low computational efficiency due to large size of memory storages and unknown parameters. For example, a single high quality image stored in 32 bit color format would take 192 Mbytes of size 8000*6000*4. Hence, in tensor data analysis, a gap between statistical and computational efficiency often arises. To narrow the gap, this proposal exploits the intrinsic data structure in different contexts to develop provable non-convex optimization algorithms that achieve both computational efficiency and optimal statistical rate in modeling big tensor data. Our proposal consists of three projects exploring different statistical aspects of big tensor data, while developing efficient tensor algorithms: P1. Tensor Recovery via Cubic Sketching: we propose a general framework of tensor recovery via a novel rank-one cubic sketching, with applications to high-order interaction models and statistical compressed image transmission; P2. A Tensor Solution to Hyper-graph Partition: we propose a novel tensor formulation of hyper-graph which can be utilized to study partition problems; P3. Statistical Inference on Tensor Graphical Models: we develop the first set of statistical inference for high dimensional tensor-variate graphs. The Navy is often interested in inferring structures from tensorvalued data observed in many fields such as personalized recommendation systems and imaging research. The proposal addresses statistical properties of tensor data analysis, while developing efficient tensor algorithms. Together with new algorithms, the proposal therefore has the potential to significantly advance the Navy s ability to process various types of large tensor data.

Document Details

Document Type
DoD Grant Award
Publication Date
Jul 27, 2018
Source ID
N000141812759

Entities

People

  • Guang Cheng

Organizations

  • Office of Naval Research
  • United States Navy
  • University of Virginia

Tags

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Linear Algebra
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms