St. Jude Family of Websites
Explore our cutting edge research, world-class patient care, career opportunities and more.
St. Jude Children's Research Hospital Home
St. Jude Family of Websites
Explore our cutting edge research, world-class patient care, career opportunities and more.
St. Jude Children's Research Hospital Home
Principal bioinformatics research scientist Evadnie Rampersaud, PhD, got her start in data science before the term was widely adopted, now she uses her skills to advance research at St. Jude.
1. What first inspired you to have a research career in data science?
The term data science is now ubiquitous, but I remember a time when it didn’t exist in its current form. During my time as a graduate student at Duke University, the first draft of the Human Genome Project was released — a landmark moment. Looking back, our understanding of the genome’s structure and its hidden complexities was still in its infancy. Our graduate cohort was enthusiastic, eager to explore this new frontier of genomic science and serve as its early pioneers.
That initial excitement was quickly tempered by reality. The tools and methods we had relied on were utterly inadequate for managing the vast volumes of data generated from sequencing just a single human genome — let alone hundreds or thousands. The rate of data generation far outpaced our ability to process or interpret it meaningfully.
In response, advanced statistical models began emerging to meet the challenge. Around the same time, a new Master’s program in Bioinformatics was introduced, leveraging advanced mathematical techniques to better characterize genetic sequences probabilistically. It was then that I began to realize the importance of learning how to deploy tools traditionally used by computer scientists — tools that could efficiently analyze the genome at scale.
This shift happened midway through my PhD, and to me, it felt like a cultural revolution in genetics. The rapid evolution of genomic science — from next-generation sequencing to protein prediction, multi-omics integration and even extracting genetic insights from electronic health records — has now culminated in the widespread adoption of data science, predictive modeling, artificial intelligence (AI) and machine learning.
If we asked students today why they pursue data science, their reasons might vary. But for me, it was straightforward: I wanted to understand how genetic and genomic alterations impact individuals and families affected by disease. Data science was — and remains — the key to unlocking those answers.
2. What role do data scientists play in research projects, and how does the Center for Applied Bioinformatics enable discoveries at St. Jude?
Many genomic science projects require an interdisciplinary team of scientists. In this collaborative environment, “wet lab” researchers or physician-scientists work closely with data scientists to uncover patterns in data that may highlight functionally significant regions of the genome.
At the Center for Applied Bioinformatics (CAB), our structure reflects this interdisciplinary approach, with specialized groups dedicated to different data types. For instance, the Genomics Group focuses primarily on somatic mutation detection; the Transcriptomics Group handles RNA-based data; and the Epigenomics Group works on DNA methylation and chromatin modeling. The group I lead, the Genetics Group, is dedicated to germline studies, identifying mutations that predispose individuals to pediatric oncologic or neurological diseases.
Across all CAB groups, a key goal is to support investigators at St. Jude by managing the processing and quality control of raw data, thereby reducing the technical burden on individual research teams. But our core strengths lie in collaborative, tertiary analyses that help interpret complex datasets and, at times, contribute directly to discoveries.
In addition to building custom analytical tools, we actively engage in research and development to evaluate and improve existing data science methods, further enhancing the quality and rigor of our studies. One of our most impactful contributions is identifying and integrating external or publicly available datasets into St. Jude research — efforts that are both extensive and critical.
As data scientists, we’re inherently driven by a love of data and always want more. The fast-paced nature of this field requires us to evolve and expand our skillsets constantly. Because of this, having a data scientist’s perspective during study design and analysis is not just valuable; it can open entirely new avenues for insight and interpretation.
3. What do you wish more people understood about data science?
Communicating the idea that data science is a broad, multifaceted discipline can be challenging. Many people tend to associate it primarily with computer science applications, such as AI or machine learning. However, the foundations of data science are deeply rooted in statistical theory, going back to the basics championed by pioneers such as Sir Ronald Fisher, who established the field. What we now call “data science” has emerged as a modern discipline largely through advances in computing, but its core principles remain grounded in statistics.
Importantly, data science isn’t limited to one domain; it’s just as applicable to uncovering hidden patterns in biology as it is to solving problems in business or finance. That said, it’s not foolproof. Misapplied methods or unclear assumptions can lead to incorrect conclusions. Choosing the right data science approach depends entirely on the nature of the data and the specific question being asked. Helping others grasp this nuance is essential for using data science effectively and responsibly.
4. How can data analysis go wrong?
As the saying goes, “garbage in, garbage out.” If the original data is not high-quality, the analysis will likely be incorrect. Performing basic quality control and normalization, labeling samples correctly — all of this is critically important to generate results that can be validated. A basic lesson is that for data science models, reproducibility and rigor are key. Along those lines, anyone collaborating with data scientists should understand the concepts of over-fitting, validation and extrapolation.
5. What advice would you give to researchers in training who are interested in careers leveraging data science?
AI machinery is reshaping nearly every field, including data science, and its momentum continues to accelerate. For young researchers entering this field, learning how to leverage AI tools effectively isn’t just a bonus — it’s becoming indispensable. One of the most accessible and immediately impactful applications of AI is in streamlining basic programming tasks. AI tools can dramatically reduce the time spent on repetitive or routine work. Gaining agility in optimizing these AI tools will provide young researchers with an edge that will be greatly appreciated by employers. Beyond that, spending time thinking creatively about larger-than-life problems can spur novel applications in data science, particularly when accompanied by AI-enabled tools. I would urge researchers in training to revisit old basic biology questions. They may not have been “solvable” back then, but might be more tractable today.