Qixuan Chen, Columbia
Predictive Inference for Non-Probability Samples Using Bayesian Machine Learning
Abstract: Probability surveys have long been considered the gold standard for population inference, but they are increasingly expensive to collect and face declining response rates. Non-probability samples, while more readily available, raise fundamental questions about their generalizability to target population. In this talk, I will present Bayesian predictive inference approaches that improve inference from non-probability samples by integrating administrative data and probability surveys. We focus on data-rich settings with high-dimensional auxiliary information available for both the sample and external data sources. When individual-level auxiliary data are accessible through administrative records, we propose a regularized predictive inference approach using Bayesian additive regression trees and related extensions to predict population outcomes. We further extend this framework to accommodate confidentiality constraints, two-phase designs, and generalizability in causal inference. We illustrate the application of these methods across a wide range of settings, from non-probability surveys to randomized clinical trials, demonstrating how findings from samples can be generalized to target populations.
Bio: Dr. Chen is Associate Professor of Biostatistics at Columbia University. She obtained her PhD in Biostatistics from the University of Michigan in 2009. Her research focuses on survey sampling, missing data, measurement error, data integration, and Bayesian modeling. She collaborates extensively with interdisciplinary researchers on the design and analysis of longitudinal and cross-sectional health surveys at local, national, and international levels. She is Associate Editor for Biometrics and Chair-Elect for the ASA Survey Research Methods Section.