Explore a selection of software & methods developed by St. Jude researchers. 

shinyCox: Create 'shiny' Applications for Cox Proportional Hazards Models

In collaboration with the Pounds group, we developed the shinyCox R package to visualize survival functions predicted by Cox models. This software supports non-statisticians to interrogate Cox model results beyond traditional tabular summaries. We believe shinyCox will be impactful in maximizing our use of available resources and to drive advances in precision oncology and similar fields. 

Supporting paper: Selukar et al. Interactive Use of Cox Model–Predicted Survival Curves: An Application Using ACS10 Score and Age to Personalize Treatment of Pediatric AML. JCO Precis Oncol 9, e2500634(2025).

MOADE: A multimodal autoencoder for dissociating bulk multi-omics data

MOADE, a multimodal autoencoder pipeline linking multi-dimensional features to jointly predict personalized multi-omic profiles and cellular compositions, using pseudo-bulk data constructed by internal non-transcriptomic reference and external scRNA-seq data. MOADE is evaluated through rigorous simulation experiments and real multi-omic data from multiple tissue types, outperforming nine deconvolution pipelines with superior generalizability and fidelity.

Supporting paper: Sun, J., Malik, A., Lin, T. et al. MOADE: a multimodal autoencoder for dissociating bulk multi-omics data. Genome Biol 26, 325 (2025).

Location-smoothed Bayesian Adaptive Regression Tree: lsBART

lsBART is an AI-based quality assurance software that integrates with existing organ contouring technology for use with CT and MRI images to segment organs and tissues. lsBART uses shape statistics and smoothing algorithms to create 3D models of whole organs and structures, adjusting for varying image qualities and modalities. In this way, lsBART can pinpoint potential AI-generated contouring errors by flagging individual images that require human supervision. 

Supporting paper: Zachary T Wooten, Mary Pham, Laurence E Court, Christine B Peterson, Location smoothed Bayesian additive regression trees: a method for interpretable and robust quality assurance of organ contours in radiotherapy treatment planningJournal of the Royal Statistical Society Series C: Applied Statistics, 2025;, qlaf024

DTE-BOP2: Bayesian Optimal Phase II design under Delayed Treatment Effects

Implements a Bayesian Optimal Phase II design (DTE-BOP2) for trials with delayed treatment effects, particularly relevant to immunotherapy studies where treatment benefits may emerge after a delay. The method builds upon the BOP2 framework and incorporates uncertainty in the delay timepoint through a truncated gamma prior, informed by expert knowledge or default settings. Supports two-arm trial designs with functionality for sample size determination, interim and final analyses, and comprehensive simulation under various delay and design scenarios. Ensures rigorous type I and II error control while improving trial efficiency and power when the delay effect is present. 

Supporting paper: Zhongheng Cai, Haitao Pan. A Bayesian Optimal Phase II Design for Randomized Immunotherapy Trials with Delayed Treatment Effects. Under Review, 2025. Preprint available at arXiv <arXiv:2509.00238>

Proximal Causal Inference with a 2-Stage regression approach: pci2s 

Confounding factors are unavoidable in epidemiological studies. While many overt confounders are accounted for during study design, human behavior patterns, along with other factors that may not have been known at the time of study, may introduce hidden bias, resulting in inaccuracies in results interpretation. Negative control variables are routinely collected covariates which may assist with better definition of various types of hidden confounders and can be utilized to adjust for their impact on treatment, outcome, and known confounders. The software package pci2s leverages negative control variables to better adjust for hidden or unmeasured confounders, lending greater robustness and accuracy to data interpretation.

Supporting paper: Li, Kendrick Qijun; Linderman, George C.; Shi, Xu; Tchetgen Tchetgen, Eric J. Regression-based Proximal Causal Inference for Right-censored Time-to-event Data. Epidemiology 36(5):p 694-704, September 2025.

MICSQTL: Multi-omic deconvolution, Integration and Cell-type-specific Quantitative Trait Loci

Our pipeline utilizes scRNA-seq reference and bulk transcriptomes to estimate cellular composition in the matched bulk proteomes. The expression of genes and proteins at either bulk level or cell type level can be integrated by the Angle-based Joint and Individual Variation Explained (AJIVE) framework. Meanwhile, MICSQTL can perform cell-type-specific quantitative trait loci (QTL) mapping to proteins or transcripts based on the input of bulk expression data and the estimated cellular composition per molecule type, without the need for single cell sequencing. We use matched transcriptome-proteome from human brain frontal cortex tissue samples to demonstrate the input and output of our tool.

Supporting paper: Pan Y, Wang X, Sun J, Liu C, Peng J, Li Q (2024). Multimodal joint deconvolution and integrative signature selection in proteomicsCommunications Biology, 7(493). 

BEAMR: Bootstrap Evaluation of Association Matrices

The BEAM method is used to construct a matrix that establishes associations between different variables within large datasets. Once we amass a computational cloud of data points in space, we can measure if there is a meaningful association between all the features in relation to patient survival. We eventually hope to link this method to what we know about gene and drug interactions available via public databases so we can begin to make therapy suggestions.

Supporting paper: Seffernick AE, Cao X, Cheng C, Yang W, Autry RJ, Yang JJ, Pui CH, Teachey DT, Lamba JK, Mullighan CG, Pounds SB. Bootstrap Evaluation of Association Matrices (BEAM) for Integrating Multiple Omics Profiles with Multiple Outcomes. bioRxiv [Preprint]. 2024

PlatformDesign: Optimal two-period multi-arm platform design

Design parameters of the optimal two-period multiarm platform design (controlling for either family-wise error rate or pair-wise error rate) can be calculated using this package, allowing pre-planned deferred arms to be added during the trial.

Supporting paper: Haitao Pan, Xiaomeng Yuan, Jingjing Ye, An Optimal Two-Period Multiarm Platform Design with New Experimental Arms Added During the Trial, N Engl J STAT DATA SCI 2(2024),

ISLET: Individual-Specific CeLl typE referencing Tool

ISLET is a method to conduct signal deconvolution for general -omics data. It can estimate the individual-specific and cell-type-specific reference panels, when there are multiple samples observed from each subject. It takes the input of the observed mixture data (feature by sample matrix), the cell type mixture proportions (sample by cell type matrix), and the sample-to-subject information. ISLET can solve for the reference panel on the individual-basis and conduct tests to identify cell-type-specific differential expression (csDE) genes.

Supporting paper: Feng, H., Meng, G., Lin, T. et al. ISLET: individual-specific reference panel recovery improves cell-type-specific inference. Genome Biol 24, 174 (2023).

UnifiedDoseFinding package: Dose-Finding Methods for Non-Binary Outcomes

In many phase I trials, the design goal is to find the dose associated with a certain target toxicity rate. In some trials, the goal can be to find the dose with a certain weighted sum of rates of various toxicity grades. For others, the goal is to find the dose with a certain mean value of a continuous response. This package provides the setup and calculations needed to run a dose-finding trial with non-binary endpoints and performs simulations to assess design’s operating characteristics under various scenarios. Three dose finding designs are included in this package: unified phase I design (Ivanova et al. (2009) <doi:10.1111/j.1541-0420.2008.01045.x>), Quasi-CRM/Robust-Quasi-CRM (Yuan et al. (2007) <doi:10.1111/j.1541-0420.2006.00666.x>, Pan et al. (2014) <doi:10.1371/journal.pone.0098147>) and generalized BOIN design (Mu et al. (2018) <doi:10.1111/rssc.12263>). The toxicity endpoints can be handled with these functions including equivalent toxicity score (ETS), total toxicity burden (TTB), general continuous toxicity endpoints, with incorporating ordinal grade toxicity information into dose-finding procedure. These functions allow customization of design characteristics to vary sample size, cohort sizes, target dose-limiting toxicity (DLT) rates, discrete or continuous toxicity score, and incorporate safety and/or stopping rules.

MREPS: Mendelian Randomization Analysis

We have proposed a versatile and efficient approach for Mendelian randomization analysis under different study designs. They can be random sampling design, extreme tails of the primary outcome of interest, or extreme tails of the risk factor that is the primary outcome of interest in the original study. Here, the risk factor is a continuous variable and can be gene expression or methylation data.

Supporting paper: Liyanage JSS*, Estepp J*, Srivastava K, Raskin S, Sheehan VA, Hankins J, Takemoto C, Li Y, Cui Y, Mori T, Burgess S, DeBaun M, Kang G#. A versatile and efficient novel approach for Mendelian randomization analysis with applications to assess the causal effect of fetal hemoglobin on anemia in sickle cell anemia. Mathematics 2022.

Keyboard: Bayesian Designs for Early Phase Clinical Trials

An R package Keyboard containing functions for the implementation and simulation of two phase I model-assisted maximum tolerated dose (MTD)-finding designs for single-agent and combination trials, and one biological dose (OBD)-finding phase I/II design.

Supporting paper: Chen Li, Hongying Sun, Cheng Cheng, Li Tang, Haitao Pan, A software tool for both the maximum tolerated dose and the optimal biological dose finding trials in early phase designs. Contemporary Clinical Trials Communications, 2022 

STEPS: Secondary traits genetic and explosure-secondary outcomes association studies

We propose a novel set-valued (SV) method to assess secondary trait genetic association studies and exposure-secondary outcome association studies using data collected under case-control study design (SV2bc) and extreme phenotype study design (STEPS). Here, secondary traits can be binary or continuous variables and can be associated with gene expression, gene methylation, etc.   

Supporting paper: Wenjian Bi, Yun Li, Matthew P Smeltzer, Guimin Gao, Shengli Zhao, Guolian Kang, STEPS: an efficient prospective likelihood approach to genetic association analyses of secondary traits in extreme phenotype sequencingBiostatistics, Volume 21, Issue 1, January 2020, Pages 33–49,

HCTDesign: Group Sequential Design for Historical Control Trial with Survival Outcome

Randomized clinical trials (RCTs) are considered the gold standard for clinical trials comparing treatment groups. However, historical control trials (HCTs) are an alternative to RCTs if randomization is not feasible because of ethical concerns, patient preference, limited patient populations, or regulatory acceptability. The primary benefit of HCTs is that all patients can receive the new treatment with historical data providing the information for the control arm. Therefore, HCTs are useful for studies with limited patient populations. Group sequential designs using Lan-DeMets error spending functions are proposed for historical control trials with time-to-event endpoints. Both O’Brien–Fleming and Gamma family types of sequential decision boundaries are derived based on sequential log-rank tests (Wu and Li, 2017).

Supporting paper: Wu J, Li Y. Group sequential design for historical control trials using error spending functions. J Biopharm Stat. 2020 Mar;30(2):351-363. doi: 10.1080/10543406.2019.1684305. Epub 2019 Nov 12. PMID: 31718458; PMCID: PMC7737423.

Genomic Random Intervals: GRIN

We use the GRIN method to find genes that have an overabundance of genomic abnormalities in tumor cells. Because this method examines all types of genomic lesions simultaneously, we can discover important, but cryptic, genomic abnormalities that manifest in different ways. Our goal is to tie GRIN to gene expressions and clinical outcomes.

Supporting paper: Pounds S, Cheng C, Li S, Liu Z, Zhang J, Mullighan C. A genomic random interval model for statistical analysis of genomic lesion data. Bioinformatics. 2013 Sep 1;29(17):2088-95.

Additional software and methods

Bayesian longitudinal low-rank regression models for imaging genetic data from longitudinal studies

  • Zhaohua Lu et al. NeuroImage. 2017 149(1) 305-322.
  • Hongtu Zhu, Zakaria Khondker, Zhaohua Lu, Joseph G. Ibrahim. Journal of the American Statistical Association. 2014; 109 (507) 977-990.

Bayesian variable selection and model comparison for factor analysis modeling

  • Zhaohua Lu et al. Psychological methods. 2017; 22(2):361-381.
  • Zhaohua Lu et al. Multivariate behavioral research. 2016; 51(4):519-39

Multiplicity-Adjusted Evidence Weights

  • Wenjian Bi, Guolian Kang, Stanley Pounds; Presented at the BIBM2017 Meeting

A Robust and Powerful Set-Valued Approach to Rare Variant Association Analyses of Secondary Traits in Case-Control Sequencing Studies

  • Guolian Kang, Wenjian Bi, et al. Genetics. 2017; 205(3), pp. 1049-1062

SVSI: Fast and Powerful Set-Valued System Identification Approach to Identifying Rare Variants in Sequencing Studies for Binary and Ordered Categorical Traits

  • Wenjian Bi, Guolian Kang, et al. Human Heredity 2014; 78:104-116; Annals Of Human Genetics 2015; 79: 294-309, 2015

rctrack: An R Package that Automatically Collects and Archives Details for Reproducible Computing

  • Zhifa Liu and Stan Pounds. BMC Bioinformatics 2014 Mar

The Most Informative Spacing Test Effectively Discovers Biologically Relevant Outliers or Multiple Modes in Expression

  • Iwona Pawlikowska, Gang Wu, Michael Edmonson, Zhifa Liu, Tanja Gruber, Jinghui Zhang, Stan Pounds. Bioinformatics 2014 Jan

A genomic random interval model for statistical analysis of genomic lesion data

  • Stan Pounds, Cheng Cheng, Shaoyu Li, Zhifa Liu, Jinghui Zhang, Charles Mullighan. Bioinformatics Epub 2013 July 10

Empirical Bayesian Selection of Hypothesis Testing Procedures for Analysis of Sequence Count Expression Data.

  • Pounds SB, Gao CL, Zhang H. Statistical Applications in Genetics and Molecular Biology 2012 Oct 19;11(5).

A Procedure to Statistically Evaluate Agreement of Differential Expression for Cross-Species Genomics

  • Pounds S, Gao C, …, Gilbertson RJ. Bioinformatics 2011 Aug 1;27(15):2098-103. Epub 2011 Jun 22.

Cross-Species Genomics Matches Driver Mutations and Cell Compartments to Model Ependymoma

  • Johnson R, …, Pounds SB, …, Gilbertson RJ. Nature 2010 Jul 29;466(7306):632-6. Epub 2010 Jul 18.

Subtypes of Medulloblastoma have Distinct Developmental Origins

  • Gibson P, …, Pounds SB, …, Gilbertson RJ. Nature 2010 Dec 23;468(7327):1095-9. Epub 2010 Dec 8.

PROMISE: a tool to identify genomic features with a specific biologically interesting pattern of associations with multiple endpoint variables.

  • Pounds S, Cheng C, Cao X, Crews KR, Plunkett W, Gandhi V, Rubnitz J, Ribeiro RC, Downing JR, Lamba J. Bioinformatics 2009 Aug 15;25(16):2013-9. Epub 2009 Jun 15.

Reference Alignment of SNP Microarray Signals for Copy Number Analysis of Tumors.

  • Pounds S, Cheng C, Mullighan C, Raimondi SC, Shurtleff S, Downing JR. Bioinformatics 2009 Feb 1;25(3):315-21. Epub 2008 Dec 3.

Assumption Adequacy Averaging as a Concept for Developing More Robust Methods for Differential Gene Expression Analysis.

  • Stan Pounds and Shesh N. Rai. Comput Stat Data Anal. 2009 Mar 15;53(5):1604-1612.

Computational Enhancement of a Shrinkage-Based ANOVA F-test Proposed for Differential Gene Expression Analysis.

  • Stan Pounds. 2007 Biostatistics

Genes regulating B cell development are mutated in acute lymphoid leukaemia

  • Charles G. Mullighan, Salil Goorha, Ina Radtke, Christopher B. Miller, Elaine Coustan-Smith, James D. Dalton, Kevin Girtman, Susan Mathew, Jing Ma, Stanley B. Pounds, Xiaoping Su, Ching-Hon Pui, Mary V. Relling, William E. Evans, Sheila A. Shurtleff, James R. Downing. Nature. 2007 Apr 12;446(7137):758-64.

Estimation and Control of Multiple Testing Error Rates for the Analysis of Microarray Data.

  • Stan Pounds. Brief Bioinform. 2006 Mar;7(1):25-36. Review

Robust Estimation of the False Discovery Rate.

  • Stan Pounds, Cheng Cheng. Bioinformatics 2006 Aug 15;22(16):1979-87. Epub 2006 Jun 15.

Statistical Development and Evaluation of Gene Expression Data Filters.

  • Stan Pounds, Cheng Cheng. J Comput Biol 2005 May;12(4):482-95. Review.

Sample Size Determination for the False Discovery Rate.

  • Stan Pounds, Cheng Cheng. Bioinformatics 2005 Dec 1;21(23):4263-71. Epub 2005 Oct 4. Erratum in: Bioinformatics. 2009 Mar 1;25(5):698-9.

Improving False Discovery Rate Estimation.

  • Stan Pounds, Cheng Cheng. Bioinformatics 2004 Jul 22;20(11):1737-45. Epub 2004 Feb 26.

Estimating the Occurrence of False Positives and False Negatives in Microarray Studies by Approximating and Partitioning the Empirical Distribution of p-values.

  • Stan Pounds, Steve Morris. Bioinformatics 2003 Jul 1;19(10):1236-42.

Contact us

Department of Biostatistics
MS 768, Room R6030
St. Jude Children's Research Hospital

262 Danny Thomas Place
Memphis, TN, 38105-3678 USA
(901) 595-4986 tomi.mori@stjude.org