No one is born knowing how to read. As a kid, you learn to break down syllables before piecing them back together. Once you can break down a string of letters, articulating each consonant and vowel, you can tie a meaning to the word. But that doesn’t mean you can read and understand a sentence, let alone a whole book.
In research, sequencing is key to unlocking biological discoveries, but sequencing, like understanding individual words, is only the first step. The words on the page and the biological data must be analyzed and interpreted. You may know what words mean on their own, but what do they mean in context? Similarly, big data in research can be unwieldy and demand more thorough processing to understand results. To achieve a research goal, someone (or something) must find the meaning in the mess, pulling the gems of discovery out of the raw data.
The Center for Applied Bioinformatics (CAB) is that force at St. Jude. CAB researchers develop open-access pipelines to process data from the Hartwell Center for Biotechnology, another Shared Resource at St. Jude, which conducts sequencing. These in-house shared resources bridge the gaps in scientific inquiry, providing state-of-the-art technologies and services to help researchers reach their full potential.
CAB’s partners rely on the center’s pipelines to present their data in a more accessible or relevant way. This step, called data processing, starts when the Hartwell Center or CAB initiates the AutoMapper pipeline, transferring data to CAB for further processing and exploration when a researcher defines a scientific question. CAB tailors these pipelines to the needs of the investigators. After an initial consultation, group leads at CAB direct any outstanding tasks to more specialized CAB staff.
“As bioinformaticians, we take that sequence data and help the biological, lab-based researchers interpret and understand what the sequencing data tells us based on the parameters of their specific experiment,” said Jason Myers, MS, DevOps manager at CAB.
Bioinformaticians at CAB also need to think about maximizing their time, especially as a relatively small team with a hefty, diverse workload. For Wojciech Rosikiewicz, PhD, a senior bioinformatics research scientist at CAB, every project is an opportunity to develop reproducible code. Reproducibility allows scientists to both generate the same results each time they run the code and reduce the chance of technical errors.
A few years ago, Rosikiewicz and his colleagues worked on a frequently requested analysis, but it took several days of hands-on steps to conduct. While handling the first few data sets, the investigator also asked Rosikiewicz to assist with even more upcoming data. Rosikiewicz had concerns the task would become undeliverable. However, each time Rosikiewicz encountered a new analytical step, he imagined it as a small piece in a larger puzzle.
“I will try to make each piece of code, for each step, as a kind of reusable Lego block,” Rosikiewicz said. “A small module with a specific function that, if needed, may become a part of the pipeline later on.”
By connecting and building on these “blocks,” Rosikiewicz started forming a structure that his colleagues could apply in other contexts, and that became the foundation of pipelines the team uses almost every week.
“After putting all those blocks together, the same analysis that previously took me two whole days right now may take one hour of actual work,” Rosikiewicz said. This opens up the bioinformaticians’ schedules, allowing them to help more investigators within their niches.
CAB Director Gang Wu, PhD, sees CAB as a uniquely situated resource at St. Jude, helping a varied slate of projects achieve their goals, all while avoiding overlap with other teams.
“We are all doing so many different projects,” Wu said. “That’s why we have to have deep collaboration with the embedded scientists.”
“We are highly collaborative with CAB,” Plummer said. “CAB and CSO have been instrumental in getting a computational pipeline down for the sequencing-based spatial technologies.”
Plummer hopes to continue this partnership with CAB to innovate beyond the pipeline CSO uses.
“In the future, we hope to build on CAB’s existing genomic structure and really pair in any kind of spatial omics technology through to a place by which they can computationally analyze it,” Plummer said.
Beyond collaborating with labs, CAB facilitates work between labs.
“We have harmonized the data to allow cross-lab collaboration,” Wu said. “This has been a big value.”
Along with encouraging collaboration between investigators, CAB is turning the page and considering how data will play a role in research in the future.
Instead of knowledge driving research, Wu thinks data can inform the research questions investigators ask.
“What investigators need nowadays is data-driven hypothesis generation,” Wu said. “It’s a hypothesis-driven process enhanced by the exploratory analysis of our accumulated data. Of course, you need to harmonize all this data and make it available. That’s what we do.”
Despite the advantages of artificial intelligence, which can be used in hypothesis generation, Myers said he thinks bioinformaticians do more than just process and analyze data. While automation speeds up bioinformatics work, to Myers, a big component of bioinformatics is communication.
However, staying at the forefront of new technologies can complement these communication skills, enabling bioinformaticians to do their jobs better and faster.
“In St. Jude, we have a substantial repository of data, and we have so many questions we want to answer,” said Wenjie Qi, PhD, a bioinformatics research scientist at CAB. “Applying these big models and these new technologies, we can answer more biological questions.”
CAB helps researchers decipher a novel’s worth of data, unlocking the truly novel findings. By forging connections with labs and embracing new technologies, CAB researchers help investigators go from reading a single word to understanding a whole story, contributing to greater scientific knowledge.