Harvard Catalyst Profiles

Contact, publication, and social network information about Harvard faculty and fellows.

Statistical and Computational Studies of Microarray Data


Microarrays have emerged as a new tool in biological and clinical research, giving a global view of a biological process in an unprecedented scale by simultaneous measurements of expression levels for thousands of genes. However, while their use is becoming widespread, many important issues remain unresolved and their potential for revealing important insights has not been fully realized. The initial part of this work will be on a more accurate estimation of microarray expression values. For example, performance of different probes on oligonucleotide arrays appears to vary widely depending on the melting temperature of the probe sequence, and this will be incorporated in a new algorithm. The main part of the work will be on developing new techniques for the discovery and understanding of complex interactions among genes as well as between genes and phenotypes. Moving beyond pairwise linear correlations, nonlinear and higher-order interactions among multiple genes will be explored with novel metrics. Density estimation techniques from multivariate statistics and other sophisticated computational tools will be employed to sift through billions of possible combinatorial arrangements. Those combinations found to be significant will be examined in depth and biologically validated when possible. Finally, a statistical framework will be developed in the generalized linear model setting in order to understand the relationship between genotypic and phenotypic data. To handle the large number of highly collinear genes in expression data, new computational techniques based on partial least squares will be developed. Preliminary results in finding correlations between genes and censored patient survival times have been promising, and similar methods will be developed and applied to identify predictive genes in the context of various types of phenotypic data. The candidate has been trained in applied and computational mathematics, and he now aims to apply his skills to problems in bioinformatics and functional genomics. The proposed award will allow him to receive a thorough training in molecular biology and genomics at Harvard Medical School and Children's Hospital in Boston. Through this transitional period, the candidate would like to become an independent investigator, able to lead a multidisciplinary team in an integrated approach to studying complex biological systems.

Funded by the NIH National Center for Advancing Translational Sciences through its Clinical and Translational Science Awards Program, grant number UL1TR002541.