James McMurry, St. Jude Children’s Research Hospital

Expanding St. Jude's Computational Infrastructure

Two simple facts about the massive amount of data being generated through the Pediatric Cancer Genome Project (PCGP) will help to frame the issues surrounding its analysis and storage:

(1) The amount of data generated over the 3 years of the PCGP is greater than the total amount of data generated at St. Jude in its prior 48-year history.

(2) If the PCGP data were typed in a single line, it would stretch from the earth to the moon and back more than 16 times.

This is truly a massive amount of data. As one would expect, handling these data requires a high-performance computing (HPC) infrastructure. Therefore, St. Jude needed to nearly double its HPC capability to analyze and store this unprecedented amount of data. This included expansion of computation cores and memory and modification of the system’s architecture, so that the very large data files generated from NGS could be easily moved back and forth, from long-term storage, to short-term storage, to active computation. The system built at St. Jude is one of the premier next-generation sequencing HPC facilities in the world.

The HPC infrastructure is being further expanded as part of the second phase of the PCGP. As part of this expansion, a separate HPC is being built that will be dedicated to the Clinical Genomics project.