Skip to main content

ProteinPaint: Making Data Beautiful

By Carole Weaver, PhD
Photography by Peter Barta


An elegant web tool developed at St. Jude makes it easy for any scientist to explore cancer genome data.

Jihnhui Zhang, PhD, and Xin Zhou, PhD

Jinghui Zhang, PhD, Computational Biology chair (at left) confers with research scientist Xin Zhou, PhD.

Scientists love data. But even the bravest can feel daunted when faced with billions of pieces of it. And what good is big data if nobody uses it?

This question is becoming increasingly important in childhood cancer research. As technologies like genome sequencing move into the clinic, avalanches of data are emerging about DNA changes that occur in childhood cancers. Now, the challenge is to get scientists excited about sifting through all that data to make new discoveries and advance cures.

Jinghui Zhang, PhD, has a simple solution: make it easy, and make it fun.

“How do you make using the data an enjoyable experience, rather than having to fight and struggle with the tools to make them work?” she asks. As chair of St. Jude Computational Biology, Zhang is an expert at analyzing big data to make big discoveries about childhood cancers.

With this goal in mind, she and her research team set about revolutionizing how scientists everywhere access and explore pediatric cancer data. The result is an elegant new Web application called ProteinPaint.

A luxury vehicle—with an incredible engine

ProteinPaint is like a sleek luxury car, beautiful to look at and a pleasure to drive. With a couple of clicks on clean, simple visuals, a scientist can be drawn into deep data about a particular childhood cancer and its most common genetic alterations.

“You don’t need to learn anything first—you can go directly and use it, and the interface is intuitive,” Zhang notes. “During the exploration, users can gain knowledge about the complexity by themselves—it is a visual-based navigation process that is intuitive to human nature. That’s what we tried to capture.”

But it’s not just the sleek exterior that makes the tool stand out. It’s what’s under the hood.

ProteinPaint is powered by the largest pediatric cancer database in the world, a data portal developed at St. Jude called PeCan. Through this incredible engine, ProteinPaint delivers information on nearly 27,500 genetic alterations from more than 1,000 pediatric cancer patients.

Adult cancer data is available too, and can be compared with pediatric data with a single click. Researchers can also upload and explore their own data sets using the tool.

A global resource for discovery

With ProteinPaint, Zhang hopes to empower more researchers to take the critical next steps: Use genomic data to make more accurate diagnoses. Learn how different DNA changes contribute to cancer. Develop precision therapies tailored for the genetic makeup of a patient’s cancer.

“We want this to be the definitive resource for genomic information for the pediatric cancer community,” Zhang says. “Collectively we can better understand, using our combined knowledge, what contributes to cancer.” 

Try it out:

Donate Now Promise Archive

Birth of a Research Tool

Several years ago, ProteinPaint was just a gleam in the eye of a St. Jude researcher. Matt Parker, PhD, a former postdoc in Jinghui Zhang’s lab, designed an early version of the application to share data from the St. Jude – Washington University Pediatric Cancer Genome Project. While popular, the tool was limited in the amount of data it could show at one time.

Then, in early 2015, Xin Zhou, PhD, joined St. Jude. Within two weeks, he had cranked through six new versions of ProteinPaint to create a winning design. “Each time I got an updated version, Jinghui and I would look at it together and discuss whether this new approach could do something more valuable,” Zhou says.

“When we came up with this design, we had a very happy moment,” he adds.

Further refinements moved quickly. The application was soon published in the prominent journal Nature Genetics and released for global use by the scientific community.

“It’s unusual to have the creativity coupled with quick productivity,” Zhang observes. “It wasn’t because it was a simple task; it’s because Xin is so great.”

Zhou is too modest to use such superlatives, but he’s not afraid to think big.

“Some have called ProteinPaint a premium experience to look at cancer datasets,” he says. “This is a nice way to describe it.” His new effort is to expand ProteinPaint to display new types of data on each cancer and culprit gene, increasing its value as a discovery tool.

Even now, a team of data wranglers is busy behind the scenes, making continual updates. Data for nearly 200 new genetic alterations, called gene fusion events, have been added to the database since its first release, with more coming through the pipeline.

“There’s so much that can be done,” Zhou says, a hint of excitement in his voice. “And the data’s already there.”

St. Jude video

A web tool developed at St. Jude, ProteinPaint helps scientists explore big data from cancer genome sequencing projects and make new discoveries. To learn more about the application, view the video for a first-hand look at its capabilities.