In many case-control designs of genome-wide association studies (GWAS) or next generation sequencing studies (NGS), extensive data are available on secondary traits that may correlate and share the common genetic variants with the primary disease are available. Investigating these secondary traits can provide critical insights into the disease etiology or pathology, and strengthen GWAS or NGS results. Methods based on logistic regression (LG) were developed for this purpose. However, for the identification of rare variants (RVs), certain inadequacies in the LG models and algorithmic instability can cause severely inflated type I error, and significant loss of power, when the two traits are correlated and the RV is associated with the disease, especially at stringent significance levels. To address this issue, we propose a novel set-valued (SV) method that models a binary trait by dichotomization of an underlying continuous variable, and incorporate this into the genetic association model as a critical component. Extensive simulations and an analysis of seven secondary traits in a GWAS of benign ethnic neutropenia show that the SV method consistently controls type I error well at stringent significance levels; has larger power than the LG-based methods do; and is robust in performance to effect pattern of the genetic variant (risk or protective), rare or common variants, rare or common diseases, and trait distributions. Because of the striking and profound advantage of the SV method, we strongly recommend that it be employed instead of the LG-based methods for analyses of secondary traits in case-control sequencing studies.
Kang G, Bi W, Zhang H, Pounds SB, Cheng C, Shete S, Zou F, Zhao Y, Zhang JF, Yue W. A robust and powerful set-valued approach to rare variant association analyses of secondary traits in case-control sequencing studies. Genetics 205(3):1049-1062, 2017. (PMID: 28040743 PMCID: PMC5340322).