St. Jude Reference #SJ-20-0018
Precision medicine relies on precision diagnostics, mostly through high-throughput DNA sequencing in oncology. Researchers at St. Jude developed a computational method to enable more precise measurement of sequencer errors. Their study revealed novel insights on sequencer errors that can lead to improved instrumentation, NGS chemistry and ultimately higher DNA sequencing fidelity, resulting in software which can efficiently:
Liquid biopsy holds great promise in non-invasive diagnosis of cancers through detecting minute amounts of cell-free DNA released from cancer cells in non-solid biological tissue such as peripheral blood. A critical bottleneck in developing liquid biopsy methods is the limited accuracy of current next-generation sequencing technology (NGS) due to its relatively high error rate (0.1%-1%, as of 2018). Through mathematical modeling of NGS errors, researchers at St. Jude recently published a method (Ma et al. 2019 Genome Biol) to computationally suppress the current NGS error rate to between 10-5 and 10-4, two orders of magnitude lower than general reports, which enables broad applications of current NGS methods to detect variants with frequency as low as 0.1%.
However, this error rate is a product of both PCR errors and instrument (i.e., sequencer) errors, and it is currently unknown how to separate these error sources. Although numerous efforts, such as barcode-based sequencing methods, have been devoted to study PCR errors, there is no method to specifically measure the latter, instrument errors. As a result, instrument calibration remained a critical issue. How do service providers gauge performance of their instrument? How can they better calibrate the instrument? How do they know the quality of an experiment is within reasonable quality range? In one word, how can we have an accurate quality measurement to ensure the DNA sequencing step is done right?
So the researchers developed a novel computational algorithm to precisely measure the errors caused by sequencers. By using 3,777 publicly available datasets from 75 research institutes (in America, Europe, and Asia), they discovered highly reproducible patterns of sequencer errors, including: 1) the overall sequencer error rate is ~1×10-5; 2) at the flow-cell level, error rates are elevated in the bottom surface; 3) >90% of HiSeq and NovaSeq flow cells have a small fraction of outlier error-prone tiles with a dramatically elevated error rate; 4) Removal of outlier error-prone tiles improved sequencing accuracy.
They used these observations to produce a general-purpose algorithm, termed SequencErr, to computationally suppress sequencer errors and to also effectively monitor sequencer anomalies. SequencErr was engineered for efficiency so that a dataset with ultra-deep sequencing (1,000,000X depth) can be processed in 1.5N minutes on a single CPU core, where N is the number of target regions. Similarly, WES (100X) and WGS (~30X) datasets can be processed in under 1 CPU hour in order to monitor instrument performance.
Software, sequencing, genome, algorithm, error, biopsy, precision medicine, biopsy
Granted patents or published applications
Pending international patent application published as WO 2021/146304
Related scientific references
Ma, X., Shao, Y., Tian, L. et al. Analysis of error profiles in deep next-generation sequencing data. Genome Biol 20, 50 (2019). https://doi.org/10.1186/s13059-019-1659-6
Davis, E.M., Sun, Y., Liu, Y. et al. SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data. Genome Biol 22, 37 (2021). https://doi.org/10.1186/s13059-020-02254-2
Research Highlight: Instruments err. This tool identifies the mistakes.
We are seeking partners to commercialize this technology.
Contact the Office of Technology Licensing (Phone: 901-595-2342, Fax: 901-595-3148) for more information.