Statistical Significance Criteria for Massive Multiple Tests

Dr. Cheng Cheng

Genome-wide analyses require performing tens of thousands or even millions of statistical hypothesis tests in a single study, raising the challenge of massive multiple tests. When testing a single pair of (null vs. alternative) hypotheses, one reports a single p-value. A decision to reject the null hypothesis is usually made by comparing the p-value to a customary (and subjectively chosen) level of Type-I (false positive) error α =0.01, 0.05, or 0.10. The current consensus in handling massive multiple tests imply the control or estimation of the false discovery rate (FDR). It is important to realize, that in FDR-based multiple hypothesis testing, 5% FDR does not correspond to the 0.05 significance level (p-value cutoff). In general, p≤α does not correspond to FDR≤α. Thus, the traditional significance levels are not necessarily appropriate to use as FDR control levels. In practice, it is not always clear how to select an FDR control level or a p-value threshold α to define the set of results that will be considered statistically significant, i.e., those having p-value less than or equal to α. More specifically, it is difficult to know how to balance the trade-off between incurring too many false positives (Type I errors) against incurring too many false negatives (Type II errors). Cheng et al. (2004) propose two statistical significance criteria to assist in choosing the p-value significance threshold with a consideration to balance the number of false positives against the number of false negatives: the profile information criterion (Ip) and the total error criterion (Er), a long with a coupled FDR estimate at the selected significant threshold. These significance criteria were developed with microarray applications in mind; however, they can be used in other applications that involve extensive multiple tests. Cheng et al. (2004) also propose a guide-gene driven criterion for microarray gene expression analyses, suitable for applications where a reliable list of differentially expressed genes is available a priori from existing biological knowledge.

The Ip criterion has been successfully applied to a few collaborative research projects at SJCRH, see for example “CASP8AP2 and childhood ALL” in featured collaborative research. This criterion is mathematically formalized and developed into the adaptive profile information (API) criterion in Cheng (2006 In Optimality: The Second Erich L. Lehmann Symposium, Lecture Notes and Monographs 49:51-76, Institute of Mathematical Statistics; peer-reviewed), along with further theoretical and simulation studies.

This article appears in Statistical Application in Genetics and Molecular Biology 2004.  Other authors include Stanley Pounds (St. Jude-Biostatistics), James Boyett (St. Jude-Biostatistics), Deqing Pei (St. Jude-Biostatistics), Mei-Ling Kuo (St. Jude-Genetics and Tumor Cell Biology), and Martine Roussel (St. Jude-Genetics and Tumor Cell Biology).

Full Citation

Cheng C, Pounds SB, Boyett JM, Pei D, Kuo ML, Roussel MF. Statistical significance threshold criteria for analysis of microarray gene expression data. Stat Appl Genet Mol Biol 3:Article36, 2004. PubMed Abstract
Full Text