If link downloads do not work, copy and paste into a new tab or window.
Performance is best in the Firefox browser.


A software program for mass spectrometry-based metabolite identification

Introduction

JUMPm is a program for untargeted metabolite identification based on liquid chromatography and tandem mass spectrometry. The computer algorithm determines chemical formulas from either unlabeled or stable-isotope labeled metabolome data, and derives possible structures by predictive fragmentation during database search. JUMPm uses a target-decoy strategy based on the octet rule to estimate the rate of false discovery (FDR). The user specifies a target FDR and JUMPm will filter the data to reach the target. FDR is a critical measure of confidence which researchers can use in the analysis of their data.

The program is written in perl, Java and R. It is designed for high-performance parallel computing systems. More detailed information can be found in README file in the compressed source code package.

Download JUMPm

JUMPm is open-source and free for academic and non-profit use. 

Two sample datasets with MISSILE labeling can be downloaded: negative data and positive data.

  1. A pre-built formula database
    The mass-formula database is used to determine the chemical formulas of observed peaks.
  2. A pre-built composite database
    Includes structures from PubChem, HMDB and YMDB. The structure database is used for MS2 metabolite identification once a formula is determined.

Additional datasets:

Reverse phase + mode:

HILIC - mode:

File Preparation

JUMPm takes as input .mzXML file(s). JUMPm assumes mzXML file(s) is/are converted from .raw file(s) using ReAdW or MSConvert (without compression of peak information with 32-bit precision encoding/decoding). When converting, MS2 scans need to be preserved since JUMPm performs structure identification based on MS2 peak information. If desired, JUMPm can search MS1 only to identify just metabolite formulas.

Usage

Basically, JUMPm can be run with only one command. There are two parts; 1) .mzXML file(s) to be analyzed and 2) a parameter file containing all settings for identification including formula and structure databases to be searched.

  • jumpm -p <JUMPm parameter file> <.mzXML file(s)>

For example, to analyze "Yeast_data_set_1.mzXML", first move to the directory where the mzXML file is located and then edit jumpm.params file to properly set database and search parameters. To run JUMPm, type the following command in the command prompt/terminal:

jumpm -p jumpm.params Yeast_data_set_1.mzXML

*Tip: the .param (parameters file) is most easily edited in "wordpad," not "notepad."

Installation

Your computer system must meet these minimum requirements to run JUMPm.

System Requirements
Hardware

To run on a cluster system

  • Ø SGE or LSF job management system
  • Ø 32 GB memory on each node

To run on a single server

  • Ø 32 GB memory
  • Ø 2 GHz CPU processors with a minimum of 4 cores
Software

WINE installed if analyzing .raw files, but .mzXML files can be processed without WINE program.

The main program of JUMPm is written in perl (v5.8 or above). These following modules are needed:

Perl modules:
Parallel::ForkManager
Class::Std
Statistics::R
Statistics::Basic
Statistics::Descriptive
Set::Partition
Regexp::Common
Number::Format

R v3.1.0 or higher

mzR package is required

JAVA v1.7

More detailed information about installation can be found in README file.

Support

Please contact Junmin Peng for software support.

Citation

JUMPm: A Tool for Large-Scale Identification of Metabolites in Untargeted Metabolomics. Wang, X. et al. Metabolites 2020, 10(5), 190.

Following the publication, Dr. Ralf Tautenhahn, Product Manager Metabolomics Software at Thermo Fisher Scientific, approached us and explained some of the advance features of Compound Discoverer 3.1 along with database capabilities. We recognize that Compound Discoverer 3.1 has 275 data sources via the ChemSpider node, including BioCyc, HMDB, KEGG as the default for database search. In addition, the search results are largely dependent on the parameter settings. We have then re-examined different settings during database search and report the results here.

For this purpose, we have used the same synthetic standard metabolites (HILIC) sample (triplicated runs) described in the paper.

  • Search parameters used in the paper along with results

    Intensity threshold = default value of 10e6
    Mass tolerance (for retention time alignment) = 10 PPM
    Mass tolerance (for all other functions) = 5 PPM (default)
    Ions (compound detection) = [M-H]-
    Preferred ions (group compounds) = [M-H]-1
    Remaining parameters = default

    Identified compounds = 510
    Total formulas = 428, non-redundant formulas = 395 (reported in Figure 4C)
    Total structures = 167, non-redundant structures = 134 (reported in Figure 4C)

  • New search parameters along with results

    Intensity threshold = 10e5 (reduced from the default)
    Mass tolerance (for retention time alignment) = 15 PPM
    Mass tolerance (for all other functions) = 15 PPM
    Ions (compound detection) = [2M+ACN+H]+1; [2M+ACN+Na]+1; [2M+FA-H]-1; [2M+H]+1; [2M+K]+1; [2M+Na]+1; [2M+NH4]+1; [2M-H]-1; [2M-H+HAc]-1; [M+2H]+2; [M+ACN+2H]+2; [M+ACN+H]+1; [M+ACN+Na]+1; [M+Cl]-1; [M+DMSO+H]+1; [M+FA-H]-1; [M+H]+1; [M+H+K]+2; [M+H+MeOH]+1; [M+H+Na]+2; [M+H+NH4]+2; [M+H-H2O]+1; [M+H-NH3]+1; [M+K]+1; [M+Na]+1; [M+NH4]+1; [M-2H]-2; [M-2H+K]-1; [M-H]-1; [M-H+HAc]-1; [M-H+TFA]-1; [M-H-H2O]-1 (default)
    Preferred ions (group compounds) = [M+H]+1; [M-H]-1 (default)
    Remaining parameters = default

    Identified compounds = 1136
    Total formulas = 1113, non-redundant formulas = 983
    Total structures = 413, non-redundant structures = 342

We thank Dr. Ralf Tautenhahn for the insightful advices on Compound Discoverer 3.1.