Harvard Catalyst Profiles

Contact, publication, and social network information about Harvard faculty and fellows.

Predictive Modeling of the Functional and Phenotypic Impacts of Genetic Variants


PROJECT SUMMARY Genome-wide association studies (GWAS) have associated tens of thousands of common variants with human diseases and traits. The rapid expansion of Whole-Genome Sequencing (WGS) studies and biobanks offer great potential to understand the physiologic and pathophysiologic associations of both common and rare variants. The IGVF Consortium aims to systematically study the functional and phenotypic effects of genomic variation; it is not, however, feasible to experimentally characterize the vast number of candidate variants of interest. Computational models which can accurately predict the context-specific effects of variants are essential in designing targeted research. We propose an approach anchored on a framework of high-confidence regulatory elements (REs), from which we will develop methods to learn RE-gene links, perform rare variant association tests, and finemap causal common and rare variants. We aim to make all our results, methods, and tools available to the community through a public portal and the NHGRI and NHLBI Data Commons. Our proposal has four aims: (1) Develop a core framework of REs from open chromatin regions on which to anchor our models, improving on past approaches by producing higher-resolution predictions of functional base-pairs, producing novel RE subclassifications using functional characterization datasets from IGVF and other sources, and harnessing single-cell datasets to delineate lineage- and stimulus-specific elements. (2) Use this framework to predict the roles of variants in molecular phenotypes, specifically gene expression and cellular response to stimuli. We will build statistical and machine-learning methods to predict context-specific links between REs and their target genes, using three-dimensional conformation data produced by the IGVF Consortium and external sources. We will apply this method across many cell types and perform feature selection to build a catalog of high-confidence RE-gene links and regulatory networks. (3) Develop statistical methods to perform cell type-specific rare variant association tests (cellSTAAR) in WGS studies, and a latent variable model to prioritize candidate functional variants for traits and diseases, using results from Aims 1 and 2. We will apply these methods to analyze various metabolic, immune-mediated, and psychiatric disorders in the multi-ethnic WGS data of the NHLBI Trans-Omic Precision Medicine Program (TOPMed) and the NHGRI Genome Sequencing Program (GSP) to identify candidate causal disease-associated variants. (4) Make all the results publicly available by substantially expanding the FAVOR Portal to include whole genome variant functional annotations of all three billion genomic positions as well as cell type-specific annotations. We will implement both FAVOR and cellSTAAR in the Data Commons AnVIL (NHGRI) and BioData Catalyst (NHLBI) so researchers may use them for analysis of new datasets in a scalable cloud computing environment. We will work closely with other centers and the Data Analysis Coordinating Center (DACC) of the IGVF on joint analyses and building the IGVF Variant Catalog.

Funded by the NIH National Center for Advancing Translational Sciences through its Clinical and Translational Science Awards Program, grant number UL1TR002541.