You can’t talk about science without talking about data – the information generated from experiments and research. M. Madan Babu, PhD, FMedSci, St. Jude Department of Structural Biology member and St. Jude Center of Excellence for Data-Driven Discovery director is always considering how to interpret and think about data.
Data are generated in exceptional amounts by different experiments, efforts and initiatives. If you look closely enough, you can track footprints belonging to many different biological processes imprinted throughout these vast troves of data. But people typically do not look for these footprints, instead sticking to a specific focus or lens, such as a specific protein. But when you do trace the footprints, you can find evidence at multiple levels for a phenomenon, a new process or a gene that is implicated in a particular process or disease.
Individual pieces of evidence may not be conclusive enough, but when you put different types of data together and these data footprints reappear in different forms and shapes, then you know that there is a signal – a discovery. That's exactly what researchers such as Babu tease out through integrative data analysis or integrative data science.
Using a single monochromatic lens, such as genomics or structural biology, to look at data does result in discoveries but can also be inherently limiting, depending on the question that is addressed. Babu tries to draw a line or a thread across different lenses of data, because they are all needed to describe complex biological systems or phenomena. Although each lens (approach or data type) can lead to new discoveries, putting them together provides new perspectives and insights.
“Imagine when you walk into a room, you see objects of different colors,” Babu said. “If you wear glasses with green lenses, you will only be able to see some of the objects. But if you wear glasses with red lenses, you will only be able to see a different set of objects. Thus, if you only use genomic data to understand a biologic system, you are only likely to find genomic insights. If you only investigate transcriptomics data, you may miss the impact of post-translational regulation at the proteome level. If you only look at proteomics or transcriptomics, you may miss the role of metabolites. By viewing through multiple different colored lenses (data types), each of which on its own might narrow your focus, you are able to get a more complete picture of the biological landscape and see much more.”
“We are very excited to put these different perspectives together, while acknowledging each one on its own,” Babu added. “Our group is very diverse and includes physicists, chemists, geneticists, molecular biologists, pharmacologists, structural biologists, systems biologists and network biologists. With this diversity we all learn from each other and can prompt each other to think about our areas of expertise in new ways as people ask the most fundamental questions, which might define or challenge conventional wisdom.”
In a paper recently published in Science Signaling, Babu and Duccio Malinverni, PhD, a Bioinformatics Research Scientist at St. Jude, were inspired by the principles of evolution. The researchers used data describing protein sequences and structures from several organisms, and implemented approaches from physics and computer science to design new protein pairs that can interact with each other in a defined way.
“This whole project stemmed from a brainstorming exercise while taking a walk with Duccio,” Babu said. “From that conversation, we went on to work together to create an algorithm to guide the creation of new protein-protein interactions.”
Genes can evolve by duplication and divergence, with evolution adjusting those sequences to achieve new functions. To mimic that, the researchers needed to understand how evolution has done this on existing gene families. They also saw a need: engineering interactions between proteins to create new functions has major applications, from therapeutics to synthetic biology. But it is important to avoid unwanted interactions between pre-existing protein networks in a cell and newly engineered ones. The specificity and selectivity of newly engineered proteins must be controlled and defined precisely, or risk detrimental off-target effects.
The scientists developed a computational strategy that mimics gene duplication and divergence of pre-existing, interacting protein pairs and used it to design new interactions. They worked with the bacterial PhoP–PhoQ as a model system to demonstrate the feasibility of this strategy and validated the approach with known experimental results. Thus, the designed proteins are predicted to interact exclusively with each other and were insulated from potential interaction with their native counterparts. Furthermore, the algorithm generated new sequences that nature has not yet explored, thereby opening enormous possibilities to design such interacting protein pairs.
“Our approach uses concepts from physics, evolution and protein engineering to allow the design of potentially new protein-protein interactions,” said Malinverni. “By performing evolutionary scale experiments in-silico, we can explore huge spaces of novel protein sequences, and thus generate very large pools of candidate protein-pairs of potential relevance to test. I’m excited to apply this algorithm to challenging systems of interest at St. Jude, including signaling receptors and protein systems mutated in cancer.”
This work has vast implications for biology, particularly in the field of protein engineering and peptide design. With this approach, it might be possible, for example, to design proteins or short peptides that can inhibit or activate cell death proteins. Researchers could also use these proteins or peptides to bind specific molecules on cancer cells for applications such as CAR T-cell therapy. Additionally, the algorithm can have technical applications, such as designing new tags for protein purification efforts to facilitate biochemical and structural studies of molecules.
“The algorithm that we developed will become a critical component of our future research to discover new peptides and protein sequences,” Babu said. “Being at St. Jude, and the unique opportunities to learn about each other’s research interests, have allowed us to identify many different biological systems and contexts where this algorithm could be applied.”
The researchers are now applying this algorithm to design peptides and proteins that are like molecules that already exist natively in cells. These molecules will allow them to activate, inhibit or perturb proteins in any biological process relevant to health and disease. Their current focus is on designing peptides that could modulate key proteins such as G protein–coupled receptors (which are major drug targets) with potential for therapeutic applications. As the algorithm can be trained on new sequences, it allows users to tap into data that are being continuously generated.
“In the end we may have published a paper on the approach, but it is really the beginning of an exciting time ahead, with enormous potential for biological discoveries,” Babu said.