John Easton, PhD, St. Jude Children’s Research Hospital

Advances in Next-Generation Sequencing

Next-generation sequencing (NGS) is a new approach to extracting genetic information from biological samples. This method simultaneously examines the ordered sequence of nucleotides or bases in thousands of pieces of DNA by running thousands of individual sequencing reactions in a highly parallel manner.

The approach can be applied to the various types of genetic material that exist within a cell. For example, whole-genome sequencing is used to examine the sequence of all DNA contained within the 23 chromosomes; whole-exome sequencing focuses on the 3% of the genome that encodes the proteins that make up the body; and RNA sequencing (RNA-seq) defines the sequence in the various RNA species that are expressed within cells.

Each sequencing approach generates a massive amount of data, which requires the use of high-performance computers (HPCs) to interpret. The development of computational algorithms to interpret the DNA sequence is a science that is only in its infancy. As a result, the PCGP has made a significant effort to develop new computational methods.

Under the leadership of Jinghui Zhang, PhD, St. Jude bioinformatics scientists have made tremendous advances toward analyzing the data generated by NGS technologies. The development of these approaches has enabled us to analyze and validate whole-genome sequencing data from 700 patients representing 23 different types of pediatric cancer and whole-exome sequencing and/or RNA-seq data on an additional 2000 pediatric cases. This represents one of the largest collections of high-quality human DNA sequence data collected anywhere in the world.

The computational algorithms developed through the PCGP are proving to be among the most accurate programs available for this kind of analysis. Investigators from around the world are now using the algorithms developed at St. Jude to analyze their data.

The genomic data obtained by the PCGP can be explored via the PCGP data portal (http://explore.pediatriccancergenomeproject.org. In May 2012, the PCGP made the decision to make the raw sequence data accessible to the biomedical research community prior to publication. This more than doubled the number of human whole-genome sequences that were publicly available. More than 50 institutions from around the world have already accessed these data. This scientific resource is accelerating research around the globe on pediatric cancer, adult cancer, and other genetic diseases.