An iterative statistical framework for constructing causal gene networks based on spatial gene expression data and knockout exp
Abstract
Statement of Work:Comprehensive spatial gene expression data have become increasingly prevalent with advances in high throughput technologies. These data provide a unique opportunity to study locally differentiated gene interactions that drive tissue development and human disease. We plan to establish a framework for analyzing such data, using a database of spatial gene expression in Drosophila melanogaster embryos curated by our collaborators at Lawrence Berkeley National Laboratory. Our work emphasizes the need to generate reproducible, meaningful discoveries for the biologicalcommunity, relying on close ties between scientists and statisticians to establish a feedback loop between hypothesis and data analysis.Objective:Gene regulation operates through an intricate landscape of spatially distinct interactions. The goal of this proposal is to build insights into complex gene regulatory networks by identifying causal mechanisms within biologically meaningful regions of developing embryos. As technological advances continue to reduce the cost of gene knockout experiments, data from these interventions will become an important resource for scientists trying to identify causal links that cannotbe determined from observational data alone. We plan to establish a unified framework for integrating observational and experimental spatial gene expression data in order to determine relationships that are stable across intervention environments. This proposal has the potential to provide scientists with a vital tool to predict how gene networks will respond to experimental manipulation, helping map the intricate architecture of gene networks.Approach:This proposal focuses on two critical aspects for identifying causal interactions in high-dimensional gene networks. The first thrust is estimating undirected networks of gene-gene interaction from observational data in order to refine the search space of potential causal predictors. We consider both the linear and nonlinear approaches to estimating dependencies between genes, and specifically tailor our methods to highly correlated spatial data. In each setting, weincorporate marginal gene interactions that have been previously established by scientific experiments into estimates that respect the modular structure of gene networks. The second thrust of this proposal is integrating observational data with results from gene knockout experiments to identify causal effects. In this proposal, we capture a specific kind of causal mechanism by focusing on interactions whose predictiveimportance remains stable across intervention environments. In order to encourage reproducible findings, we validate our estimates with an evidence building approach that combines stability, subject matter knowledge, simulation, and prediction accuracy.Overall Merit and ONR Mission/Relevance:Rapid advances in data collection technologies are allowing researches to assemble vast datasets on intricate systems such as gene, social and security networks. The scale and complexity of these systems make untangling data generating mechanisms a difficult but necessary part of understanding the critical elements in such settings. These insights can help researchers determine how complex systems respond to changes, leading to faster and more accurate decision making. The data-driven framework proposed here is a crucial step towards building a holistic understanding of causal interactions in complex systems.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Aug 12, 2016
- Source ID
- N000141612664
Entities
People
- Bin Yu
Organizations
- Office of Naval Research
- United States Navy
- University of California Regents