St. Jude Family of Websites
Explore our cutting edge research, world-class patient care, career opportunities and more.
St. Jude Children's Research Hospital Home
St. Jude Family of Websites
Explore our cutting edge research, world-class patient care, career opportunities and more.
St. Jude Children's Research Hospital Home
In collaboration with the Pounds group, we developed the shinyCox R package to visualize survival functions predicted by Cox models. This software supports non-statisticians to interrogate Cox model results beyond traditional tabular summaries. We believe shinyCox will be impactful in maximizing our use of available resources and to drive advances in precision oncology and similar fields.
Supporting paper: Selukar et al. Interactive Use of Cox Model–Predicted Survival Curves: An Application Using ACS10 Score and Age to Personalize Treatment of Pediatric AML. JCO Precis Oncol 9, e2500634(2025).
MOADE, a multimodal autoencoder pipeline linking multi-dimensional features to jointly predict personalized multi-omic profiles and cellular compositions, using pseudo-bulk data constructed by internal non-transcriptomic reference and external scRNA-seq data. MOADE is evaluated through rigorous simulation experiments and real multi-omic data from multiple tissue types, outperforming nine deconvolution pipelines with superior generalizability and fidelity.
Supporting paper: Sun, J., Malik, A., Lin, T. et al. MOADE: a multimodal autoencoder for dissociating bulk multi-omics data. Genome Biol 26, 325 (2025).
lsBART is an AI-based quality assurance software that integrates with existing organ contouring technology for use with CT and MRI images to segment organs and tissues. lsBART uses shape statistics and smoothing algorithms to create 3D models of whole organs and structures, adjusting for varying image qualities and modalities. In this way, lsBART can pinpoint potential AI-generated contouring errors by flagging individual images that require human supervision.
Supporting paper: Zachary T Wooten, Mary Pham, Laurence E Court, Christine B Peterson, Location smoothed Bayesian additive regression trees: a method for interpretable and robust quality assurance of organ contours in radiotherapy treatment planning, Journal of the Royal Statistical Society Series C: Applied Statistics, 2025;, qlaf024
Implements a Bayesian Optimal Phase II design (DTE-BOP2) for trials with delayed treatment effects, particularly relevant to immunotherapy studies where treatment benefits may emerge after a delay. The method builds upon the BOP2 framework and incorporates uncertainty in the delay timepoint through a truncated gamma prior, informed by expert knowledge or default settings. Supports two-arm trial designs with functionality for sample size determination, interim and final analyses, and comprehensive simulation under various delay and design scenarios. Ensures rigorous type I and II error control while improving trial efficiency and power when the delay effect is present.
Supporting paper: Zhongheng Cai, Haitao Pan. A Bayesian Optimal Phase II Design for Randomized Immunotherapy Trials with Delayed Treatment Effects. Under Review, 2025. Preprint available at arXiv <arXiv:2509.00238>
Confounding factors are unavoidable in epidemiological studies. While many overt confounders are accounted for during study design, human behavior patterns, along with other factors that may not have been known at the time of study, may introduce hidden bias, resulting in inaccuracies in results interpretation. Negative control variables are routinely collected covariates which may assist with better definition of various types of hidden confounders and can be utilized to adjust for their impact on treatment, outcome, and known confounders. The software package pci2s leverages negative control variables to better adjust for hidden or unmeasured confounders, lending greater robustness and accuracy to data interpretation.
Supporting paper: Li, Kendrick Qijun; Linderman, George C.; Shi, Xu; Tchetgen Tchetgen, Eric J. Regression-based Proximal Causal Inference for Right-censored Time-to-event Data. Epidemiology 36(5):p 694-704, September 2025.
Our pipeline utilizes scRNA-seq reference and bulk transcriptomes to estimate cellular composition in the matched bulk proteomes. The expression of genes and proteins at either bulk level or cell type level can be integrated by the Angle-based Joint and Individual Variation Explained (AJIVE) framework. Meanwhile, MICSQTL can perform cell-type-specific quantitative trait loci (QTL) mapping to proteins or transcripts based on the input of bulk expression data and the estimated cellular composition per molecule type, without the need for single cell sequencing. We use matched transcriptome-proteome from human brain frontal cortex tissue samples to demonstrate the input and output of our tool.
Supporting paper: Pan Y, Wang X, Sun J, Liu C, Peng J, Li Q (2024). Multimodal joint deconvolution and integrative signature selection in proteomics. Communications Biology, 7(493).
The BEAM method is used to construct a matrix that establishes associations between different variables within large datasets. Once we amass a computational cloud of data points in space, we can measure if there is a meaningful association between all the features in relation to patient survival. We eventually hope to link this method to what we know about gene and drug interactions available via public databases so we can begin to make therapy suggestions.
Supporting paper: Seffernick AE, Cao X, Cheng C, Yang W, Autry RJ, Yang JJ, Pui CH, Teachey DT, Lamba JK, Mullighan CG, Pounds SB. Bootstrap Evaluation of Association Matrices (BEAM) for Integrating Multiple Omics Profiles with Multiple Outcomes. bioRxiv [Preprint]. 2024
Design parameters of the optimal two-period multiarm platform design (controlling for either family-wise error rate or pair-wise error rate) can be calculated using this package, allowing pre-planned deferred arms to be added during the trial.
Supporting paper: Haitao Pan, Xiaomeng Yuan, Jingjing Ye, An Optimal Two-Period Multiarm Platform Design with New Experimental Arms Added During the Trial, N Engl J STAT DATA SCI 2(2024),
ISLET is a method to conduct signal deconvolution for general -omics data. It can estimate the individual-specific and cell-type-specific reference panels, when there are multiple samples observed from each subject. It takes the input of the observed mixture data (feature by sample matrix), the cell type mixture proportions (sample by cell type matrix), and the sample-to-subject information. ISLET can solve for the reference panel on the individual-basis and conduct tests to identify cell-type-specific differential expression (csDE) genes.
Supporting paper: Feng, H., Meng, G., Lin, T. et al. ISLET: individual-specific reference panel recovery improves cell-type-specific inference. Genome Biol 24, 174 (2023).
In many phase I trials, the design goal is to find the dose associated with a certain target toxicity rate. In some trials, the goal can be to find the dose with a certain weighted sum of rates of various toxicity grades. For others, the goal is to find the dose with a certain mean value of a continuous response. This package provides the setup and calculations needed to run a dose-finding trial with non-binary endpoints and performs simulations to assess design’s operating characteristics under various scenarios. Three dose finding designs are included in this package: unified phase I design (Ivanova et al. (2009) <doi:10.1111/j.1541-0420.2008.01045.x>), Quasi-CRM/Robust-Quasi-CRM (Yuan et al. (2007) <doi:10.1111/j.1541-0420.2006.00666.x>, Pan et al. (2014) <doi:10.1371/journal.pone.0098147>) and generalized BOIN design (Mu et al. (2018) <doi:10.1111/rssc.12263>). The toxicity endpoints can be handled with these functions including equivalent toxicity score (ETS), total toxicity burden (TTB), general continuous toxicity endpoints, with incorporating ordinal grade toxicity information into dose-finding procedure. These functions allow customization of design characteristics to vary sample size, cohort sizes, target dose-limiting toxicity (DLT) rates, discrete or continuous toxicity score, and incorporate safety and/or stopping rules.
We have proposed a versatile and efficient approach for Mendelian randomization analysis under different study designs. They can be random sampling design, extreme tails of the primary outcome of interest, or extreme tails of the risk factor that is the primary outcome of interest in the original study. Here, the risk factor is a continuous variable and can be gene expression or methylation data.
Supporting paper: Liyanage JSS*, Estepp J*, Srivastava K, Raskin S, Sheehan VA, Hankins J, Takemoto C, Li Y, Cui Y, Mori T, Burgess S, DeBaun M, Kang G#. A versatile and efficient novel approach for Mendelian randomization analysis with applications to assess the causal effect of fetal hemoglobin on anemia in sickle cell anemia. Mathematics 2022.
An R package Keyboard containing functions for the implementation and simulation of two phase I model-assisted maximum tolerated dose (MTD)-finding designs for single-agent and combination trials, and one biological dose (OBD)-finding phase I/II design.
Supporting paper: Chen Li, Hongying Sun, Cheng Cheng, Li Tang, Haitao Pan, A software tool for both the maximum tolerated dose and the optimal biological dose finding trials in early phase designs. Contemporary Clinical Trials Communications, 2022
We propose a novel set-valued (SV) method to assess secondary trait genetic association studies and exposure-secondary outcome association studies using data collected under case-control study design (SV2bc) and extreme phenotype study design (STEPS). Here, secondary traits can be binary or continuous variables and can be associated with gene expression, gene methylation, etc.
Supporting paper: Wenjian Bi, Yun Li, Matthew P Smeltzer, Guimin Gao, Shengli Zhao, Guolian Kang, STEPS: an efficient prospective likelihood approach to genetic association analyses of secondary traits in extreme phenotype sequencing, Biostatistics, Volume 21, Issue 1, January 2020, Pages 33–49,
Randomized clinical trials (RCTs) are considered the gold standard for clinical trials comparing treatment groups. However, historical control trials (HCTs) are an alternative to RCTs if randomization is not feasible because of ethical concerns, patient preference, limited patient populations, or regulatory acceptability. The primary benefit of HCTs is that all patients can receive the new treatment with historical data providing the information for the control arm. Therefore, HCTs are useful for studies with limited patient populations. Group sequential designs using Lan-DeMets error spending functions are proposed for historical control trials with time-to-event endpoints. Both O’Brien–Fleming and Gamma family types of sequential decision boundaries are derived based on sequential log-rank tests (Wu and Li, 2017).
Supporting paper: Wu J, Li Y. Group sequential design for historical control trials using error spending functions. J Biopharm Stat. 2020 Mar;30(2):351-363. doi: 10.1080/10543406.2019.1684305. Epub 2019 Nov 12. PMID: 31718458; PMCID: PMC7737423.
We use the GRIN method to find genes that have an overabundance of genomic abnormalities in tumor cells. Because this method examines all types of genomic lesions simultaneously, we can discover important, but cryptic, genomic abnormalities that manifest in different ways. Our goal is to tie GRIN to gene expressions and clinical outcomes.
Supporting paper: Pounds S, Cheng C, Li S, Liu Z, Zhang J, Mullighan C. A genomic random interval model for statistical analysis of genomic lesion data. Bioinformatics. 2013 Sep 1;29(17):2088-95.
Bayesian longitudinal low-rank regression models for imaging genetic data from longitudinal studies
Bayesian variable selection and model comparison for factor analysis modeling
Multiplicity-Adjusted Evidence Weights
rctrack: An R Package that Automatically Collects and Archives Details for Reproducible Computing
A genomic random interval model for statistical analysis of genomic lesion data
Cross-Species Genomics Matches Driver Mutations and Cell Compartments to Model Ependymoma
Subtypes of Medulloblastoma have Distinct Developmental Origins
Reference Alignment of SNP Microarray Signals for Copy Number Analysis of Tumors.
Genes regulating B cell development are mutated in acute lymphoid leukaemia
Estimation and Control of Multiple Testing Error Rates for the Analysis of Microarray Data.
Robust Estimation of the False Discovery Rate.
Statistical Development and Evaluation of Gene Expression Data Filters.
Sample Size Determination for the False Discovery Rate.
Improving False Discovery Rate Estimation.
Department of Biostatistics
MS 768, Room R6030
St. Jude Children's Research Hospital