My research explores how advances in causal inference, statistical machine learning, and computational statistics can empower discovery in the biomedical and health sciences. I focus primarily on the development of model-agnostic, assumption-lean statistical inference procedures, doing so while emphasizing a science-first, translational philosophy that stresses the rich interplay between the applied sciences and statistical methodology--how emerging questions in the former spur advances in the latter, which, in turn, help to refine the scientific discovery process. This approach leverages causal inference as a framework for the translation of scientific questions into interpretable statistical estimands, and then aims to formulate analytic methods that incorporate flexible learning techniques (i.e., machine learning), draw upon semi-parametric efficiency theory, and impose only those modeling restrictions justified by domain knowledge. I am also deeply interested in high-performance statistical computing and the role that open-source software and programming play in the responsible practice of applied statistics and statistical data science, especially as these relate to the promotion of transparent, reproducible, and replicable science.
My methodological work often draws upon tools and ideas from semi- and non-parametric inference, high-dimensional and large-scale inference, targeted or debiased machine learning (e.g., targeted minimum loss estimation, method of sieves), and computational statistics. Areas of recent focus include the study of (1) population-level inference on treatment effects from data collected through biased, outcome-dependent sampling designs, including extensions to sequentially adaptive sampling or survey schemes; (2) causal effect heterogeneity for optimal treatment regime and subgroup discovery; (3) doubly robust and propensity score approaches for evaluating dose-response phenomena; (4) causal mediation analysis (i.e., direct and indirect effects) for investigating questions of mechanism; and (5) safely drawing causal inferences from data exhibiting network dependence or interference structures.
My past substantive collaborations have spanned diverse areas of the biomedical and public health sciences, from toxicology and computational biology to environmental health and nutritional epidemiology. Recently, I've found myself captivated by the rich scientific and statistical problems that abound in the infectious disease sciences, including in public health virology and immunology, vaccinology, and infectious disease epidemiology. My work has contributed novel methods and insights for immune correlates analyses of vaccine efficacy trials (of HIV, COVID-19, and malaria), clinical trials of therapeutics and curatives (of COVID-19 and TB/HIV co-infection), and observational studies of the post-acute sequelae of COVID-19 ("long COVID").
Here are a few reflections on the intertwined philosophies of science and of statistics that have shaped my own perspective:
"Far better an approximate answer to the right question, which is often vague, than the exact answer to the wrong question, which can always be made precise." --John Tukey
"Everyone is sure of this [that errors are normally distributed]...since the experimentalists believe that it is a mathematical theorem, and the mathematicians that it is an experimentally determined fact." --Henri Poincare
"Science is the belief in the ignorance of experts." --Richard Feynman