- Search NCIBI Data
(e.g. diabetes, csf1r)
Advisor: Oliver He
Research Interests: Bayesian Networks, Machine Learning, Systems Biology, Agent-based Modeling, Synthetic Networks, Microarray Analysis, and Natural Language Processing (NLP)
Research at NCIBI
The responses of prokaryotes and eukaryotes to environmental stresses are ultimately guided by series of molecular signals at multiple biological levels, including gene regulatory networks and protein signaling pathways. For example, the reactive oxygen species (ROS) detoxification pathway in Escherichia coli is a representative stress pathway which serves as a defense mechanism against harmful chemicals such as hydrogen peroxide and superoxides. Many interactions between the protein-level detoxification pathway (superoxide dismutases, catalases, etc.) and underlying gene regulatory networks are less well understood, likely complicated by the presence of as yet unidentified or uncharacterized genes and other factors. The identification of these hidden factors and characterization of their influences on regulatory pathways can be achieved by coupling high throughput datasets with novel systems biology approaches such as Bayesian networks (BN) and machine learning.
My dissertation introduces a BN expansion algorithm called ‘BN+1’ which identifies hidden factors (e.g. genes, proteins) targeting the regulation and behavior of a selected pathway model. The ‘BN+1’ algorithm is used to identify genes which yield the best overall BN score when combined with the selected consensus network (hence ‘+1’ or ‘BN+1’ genes). We also introduce a novel Edge Clipper algorithm which can refine Bayesian networks to high-confidence consensus networks, and identify those genetic interactions which are most strongly supported by the available high-throughput data. It is hypothesized that the novel BN reconstruction, consensus refinement, and expansion methodologies will significantly enhance the predictive capabilities of existing BN reconstruction methods for refining pathway representations when using gene expression microarray and other datasets, identify novel unknown factors contributing to the regulation of various biological pathways, generate an integrative model for multi-scalar pathway regulation, and provide a publicly-available software infrastructure applicable to studying other known biological systems.
Our published results from the ROS detoxification analysis revealed that the ‘BN+1’ algorithm is successful in recovering known and unknown stress genes involved in ROS and biofilm regulation which were confirmed experimentally. Preliminary results from the consensus network refinement in synthetic and EcoCyc pathways suggest that the consensus refinement approach can identify those genetic interactions which are most strongly supported by biological data. These methods were also applied towards understanding selected pathways in Diabetes I/II patient cohorts in an ongoing collaborative project with the Matthias Kretzler laboratory (NCIBI, University of Michigan). Finally, our publicly-accessible MARIMBA web pipeline (http://marimba.hegroup.org) can now enable BN+1 analysis for many biological pathways and systems.
Xiang Z, Todd T, Ku K., Kovacic BL, Larson C., Chen F, Hodges AP, Tian Y, Olenzek EA, Zhao B, Colby LA, Rush HG, Gilsdorf JR, Jourdian GW and He Y. (2008) VIOLIN: vaccine investigation and online information network, Nucleic Acids Res, 36, D923-928. PMID: 18025042.
Woolf P, He Y, Hodges A. (August 2007) An Automated Method for Building Molecular Pathways Using Incremental Bayesian Learning. IP Disclosure #3792, University of Michigan. Submitted for utility/design patent on 6/16/2008 (US Patent Application # 12/139,529).