Binary regression with differentially misclassified response and exposure variables

Dr. Li Tang

Li Tang, PhD

Misclassification is a long-standing statistical problem in epidemiology. In many real studies, either an exposure or a response variable or both may be misclassified. Potential threats to the validity of the analytic results (e.g., estimates of odds ratios) that stem from misclassification have been widely discussed previously. However, much of this discussion has been restricted to the nondifferential case, in which misclassification rates for a particular variable are assumed to not depend on other variables. Here we use bacterial vaginosis and trichomoniasis data from the HIV Epidemiology Research Study (HERS) to show that complex differential misclassification patterns are common in practice. Therefore, clear illustrations of valid and accessible methods that deal with complex misclassification remain in high demand. We formulate a maximum likelihood (ML) framework that allows flexible modeling of misclassification in both the response and a key binary exposure variable, while adjusting for other covariates via logistic regression. The approach emphasizes the use of internal validation data in order to evaluate the underlying misclassification mechanisms. Data-driven simulations show that the proposed ML analysis outperforms less flexible approaches that cannot appropriately account for complex misclassification patterns. The value and validity of the ML method are further demonstrated through a comprehensive analysis of the HERS example data.

Full Citation

Tang L.*, Lyles R.H., King C.C., Celantano D. and Lo Y. Binary regression with differentially misclassified response and exposure variables. Statistics in Medicine 34:1605-1620, 2015. PMID: 25652841 DOI: 10.1002/sim.6440. (*Corresponding author)