Xiucai Ding, Duke University

Friday, January 17, 2020

2:00 PM - Halsey Hall, room 120

Complex data analysis for medical sciences

Sensor fusion for high dimensional data

Abstract: A long-lasting challenge in data science is adequately quantifying the system of interest by assembling available information from a given heterogeneous dataset. This problem is usually understood as the sensor fusion problem, which is important in machine learning, manifold learning and medical sciences.

In this talk, motivated by clinical application, we discuss how to apply the recent techniques in random matrix theory developed by H.-T.Yau and L. Erdos and the framework of free probability theory to study kernel-based machine learning algorithm for sensor fusion. Specifically, we study two kernel-based sensor fusion algorithms, nonparametric canonical correlation analysis (NCCA) and alternating diffusion (AD). The kernel matrix of interest is a product of two non-Hermitian kernel matrices. We prove that in the regime where dimensions of both random vectors are comparable with the sample size, if NCCA and AD are conducted using a smooth kernel function, then the first few nontrivial eigenvalues will converge to real deterministic values (i.e., rigidity of eigenvalues). We provide some statistics based on the eigen-ratios to test the data quality and whether the two sensors are independent. Moreover, we apply our methods to study respiratory signals from subjects with OSAS (obstructive sleep apnea syndrome). The algorithm helps us gain important insights towards clinical decision making. This talk is based on a joint work with Hau-Tieng Wu.