This manuscript was motivated by a clinical trial designed at St. Jude Children’s Research Hospital. Most of the patients undergoing treatment for acute lymphoblastic leukemia (ALL) or acute myeloid leukemia (AML) receive intrathecal chemotherapy, which is administered directly into the cerebrospinal fluid (CSF). A procedure known as lumbar puncture or lumbar puncture intrathecal (LP/LPIT) is performed to access the CSF and leakage of CSF during the procedure is hypothesized to cause headache known as postdural puncture headache (PDPH). The primary objective of that study was to determine if the Sprotte needle was associated with significantly reduced PDPH) as compared to Quinke needle, since Sprotte needle is believed to carry a significantly lower risk of CSF leakage.
Many clinical trials, such as the one mentioned above, compare two or more treatment groups using a binary outcome measure. For ethical or regulatory reasons group sequential designs are commonly employed. Then, based on a binomial distribution, the stopping boundaries for the interim analyses are constructed for assessing the difference in the response probabilities between the two groups. This can be easily implemented using statistical software package EAST. Several factors are known to often affect the primary outcome of interest, but their true distributions are not known in advance. In addition, these factors may cause heterogeneous treatment responses among individuals in a group, and their exact effect size may be unknown. To minimize the effect of these factors on the comparison of the two arms, stratified randomization is used in the actual conduct of the trial. Then, consistent with the stratified design, a stratified analysis based on the odds ratio is usually undertaken. However, the stopping rules used for the interim analyses are those obtained for determining the difference in the response rates in a design that was not stratified. In this paper, via extensive simulation studies, we evaluated the performance of such an approach when the underlying distributions of the factors and their effect sizes on the outcome measure may vary. The findings revealed that the stratified approach offers consistently better results than does the unstratified approach, as long as the difference in the weighted average of the response probabilities across strata between the two groups remains closer to the hypothesized values. However, if the response probabilities deviate significantly from the hypothesized values so that the difference in the weighted average is less than the hypothesized value, then the proposed study could be significantly underpowered.
This article appears in Biometrical Journal 2007. Other authors include Shesh Rai (University of Louisville) and Jianmin Pan (St. Jude-Biostatistics).