Power and false discovery rates of HGLMs and other statistical models analyzing RNAseq data

Thursday, February 28, 2019
Dr. Dongseok Choi, Oregon Health & Science University

RNAseq has become the standard technology in gene expression studies in the past few years. It is considered superior to microarrays that used to be the choice of technology in the 2000s.  There are still many challenges in analyzing RNAseq data and its analytic pipelines require multiple steps and combine the methods of bioinformatics and biostatistics. Since RNAseq data are typically processed to counts for downstream statistical analyses, there have been active developments of statistical models based on negative binomial regression for high dimensional counting data. The main statistical challenge is estimating complex dispersion structure in high dimensional counting data. Some popular choices are implemented in R packages such as DESeq2 and edgeR. First introduced by Lee and Nelder in 1996, the hierarchical generalized linear models (HGLMs) allow random effects in linear predictors for mean and dispersion and can be efficient in estimating complex dispersion structure of RNAseq data. In this presentation, we will review a brief history of advancement of statistical methods for RNAseq data and compare their power and false discovery rates by simulations.