Convinced that the secrets to curing childhood diseases lie tucked away in the long, twisting labyrinth of the human genome, scientists at St. Jude Children’s Research Hospital built a cloud-based platform to share and make readily available genomic data that otherwise could take months to download.
St. Jude Cloud, launched nearly three years ago, reaffirmed St. Jude’s foundational pledge to share its research with the world.
And in a recent article in the journal Cancer Discovery, St. Jude Cloud was described as “key to accelerating research to improve diagnostic precision, treatment efficacy, and long-term survival" when it comes to pediatric cancer and other childhood catastrophic diseases.
With new data added regularly, St. Jude Cloud now includes 12,000 whole genomes plus whole exome and whole transcriptome data from more than 10,000 childhood cancer patients and long-term survivors. Exomes are the part of the genome made up of portions of genes that code for amino acids, while transcriptomes are the full range of messenger RNA molecules expressed by an organism.
The platform also contains genomic data from more than 800 children with sickle cell disease, another catastrophic affliction that has been a focus of St. Jude research and treatment since the hospital's founding in 1962.
St. Jude Cloud addressed the long-standing challenge of storing, managing and accessing the massive amount of data generated by gene sequencing. Keeping the data in the cloud means researchers can avoid the protracted, sometimes months-long process of downloading data to a computer.
All told, St. Jude Cloud contains 1.25 petabytes of raw and published harmonized data — an amount equal to the data contained in more than 200,000 DVD-quality movies. Its three interconnected applications allow users to seamlessly explore, analyze and visualize data.
“St. Jude Cloud is a treasure trove of data for the global scientific community, and its data-sharing ecosystem removes barriers to discovery by researchers in that community,” said Jinghui Zhang, Ph.D., chair of the St. Jude Department of Computational Biology.
Zhang, along with St. Jude president and CEO James R. Downing, M.D., and Keith Perry, chief information officer for the hospital, are corresponding authors of the research.
St. Jude scientists built the platform in partnership with DNAnexus and Microsoft as part of an effort to accelerate data sharing and research on pediatric cancer, which remains the leading cause of childhood death by disease in the U.S.
The research was funded as a St. Jude Blue Sky Initiative, an institutional program that supports transformational projects; Microsoft AI for Good program, and ALSAC, the fundraising and awareness organization for St. Jude.
Since 2018, when it debuted with 5,000 whole genomes and whole exomes, St. Jude Cloud has grown in size, scope and function.
“Data sharing is especially important for pediatric cancer, where more than half of patients have rare tumors driven by distinct mutations,” Zhang said. “These tumors are understudied because it is difficult to accumulate the necessary patient tumor samples."
About 10,000 users visit the site monthly. They can browse and explore both raw and published pediatric cancer data from St. Jude and other institutions around the world. Investigators also can upload their own data to study alongside St. Jude Cloud data sets.
Using the power of the platform, scientists have been able to classify 135 childhood cancer subtypes based on gene expression. Zhang and her colleagues also have employed the platform to study mutation rates and mutation signatures, or patterns, in 35 pediatric subtypes. The mutational signatures included ones associated with ultraviolet radiation and cancer therapy.
“The goal is to remove barriers and enable researchers with little to no formal computational training to perform sophisticated genomic analysis,” said Clay McLeod of St. Jude Computational Biology, who, along with Alexander Gout, Ph.D., of Computational Biology, is a first author.