Jinghui Zhang, PhD, St. Jude Children’s Research Hospital

The Development of Novel Analytical Methods

To ensure accuracy and high sensitivity in identifying tumor-specific genetic lesions for the Pediatric Cancer Genome Project (PCGP) studies, scientists in the Department of Computational Biology developed a suite of novel analytical and visualization methods that use paired tumor/non-tumor next-generation sequencing (NGS) data as the input. These methods have been revised and improved substantively throughout the course of the PCGP based on the extensive validation carried out in the PCGP Validation Laboratory.

Developing Novel Algorithms

The methods include novel algorithms for detecting sequence mutations such as single-nucleotide variations and small insertion/deletions and gross lesions such as structural and copy number variations. A validation rate of more than 90% has been achieved in detecting sequence mutations while maintaining high sensitivity (i.e., mutations present in fewer than 20% of tumor cells can be detected).

To improve RNA-seq analysis, Dr. Jinghui Zhang and colleagues developed a novel mapping pipeline (StrongARM) that ensures the accurate mapping of NGS reads across intron-exon junctions. They also designed a novel assembly-based method for detecting structural variations (CICERO) such as complex fusion transcripts resulting from catastrophic genomic rearrangements.

To manage these multidimensional genomic data generated by the PCGP, the team created the Institutional Genomic Data Integration database, which has been used extensively for data integrity checks, manuscript preparation, and to support the PCGP data portal.

CREST (clipping reveals structure), which uses partially unmapped reads as a signature to identify structural variations, is one of the best-known algorithms developed through the PCGP by Dr. Zhang and colleagues. Because of its accuracy and high sensitivity, CREST has been essential for the discovery of novel, recurrent chromosomal rearrangements in low-grade glioblastoma, neuroblastoma, ependymoma, acute megakaryocytic leukemia, and osteosarcoma. Since its publication in Nature Methods in 2011, CREST has been used by more than 70 research institutions and major pharmaceutical companies worldwide.

Characterizing Telomere Length with NGS Approaches

In addition to applying NGS approaches to genetic lesions, St. Jude scientists are exploring novel NGS applications to further characterize cancer genomes. As reported in Genome Biology, Dr. Zhang and colleagues have developed the first NGS application for characterizing telomere length in pediatric cancer genomes.

Telomeres are highly repetitive regions of DNA located at the end of each chromosome that function, in part, to prevent end-to-end chromosomal fusion. With each cell cycle, telomeres become shorter, and the integrity of the chromosome eventually becomes compromised. Thus, telomeres play an important role in age-related diseases and cancer.

The development of this new NGS application to assess telomere length resulted in the discovery of an association between mutations of the ATRX gene and increased telomeric DNA in patients with a form of high-risk neuroblastoma.