Jacob Bien, U. of Southern California

Friday, April 23, 2021
2:00 PM


"Mixtures of Multivariate Regressions & Selective Inference for Clustering"



This will be a talk in two parts: The first part will focus on a statistical model developed to solve a specific problem in oceanography, while the second part will describe a widespread statistical challenge across many application areas.

Part 1: Mixture of Multivariate Regressions Modeling for Oceanographic Flow Cytometry Data

Although microscopic, phytoplankton in the ocean are extremely important to all of life and are together responsible for as much photosynthesis as all plants on land combined.  Today, oceanographers are able to collect flow cytometry data in real-time onboard a moving ship, providing them with fine-scale resolution of the distribution of phytoplankton across thousands of kilometers.  We present a novel sparse mixture of multivariate regressions model to estimate the time-varying phytoplankton subpopulations while simultaneously identifying the specific environmental covariates that are predictive of the observed changes to these subpopulations.  This is joint work with Sangwon Hyun, François Ribalet, and Mattias Cape.


Part 2: Selective Inference for Hierarchical Clustering

Although statistics textbooks emphasize the importance of forming a hypothesis before looking at a data set, in practice it is quite common for data analysts to "double dip."  That is, they first explore a data set to formulate some hypotheses and then they want to know whether what they have found is "real."  For example, after running a clustering method on some data, a data analyst looking at two of the clusters might want to know whether their means are "truly" different from each other.  Applying a standard two-sample test in such a setting will lead to a grossly inflated Type I error rate.  We develop a selective inference approach to help answer this question while properly accounting for clustering having been performed on the data.  This is joint work with Lucy Gao and Daniela Witten.


