Homepage - News - Tools - Data and Tranche! - FAQ - Archive - Sitemap

About On-line Tools
This is a listing of freely available, on-line proteomics tools. It is designed for users who simply want to know what tools they can use to help solve their problems. The tools are broken down in to arbitrary categories based on what users have been asking for. Please let us know if you have any suggestions for this page.

    Tool Categories (Add Your Tool)

    Data Curation

    INTERACT
    http://tools.proteomecenter.org/Interact.php
    INTERACT was developed to address the need to curate large datasets (from tens to hundreds of LC-MS/MS runs covering multiple tens of thousands of MS/MS spectra). INTERACT allows a user to quickly interrogate such large datasets with total flexibility including filtering, sorting, grouping, and highlighting of the data.

    Out2Summary
    http://tools.proteomecenter.org/out2summary.php
    Convert Sequest and TurboSequest *.out files into a single HTML-Summary file ready for use with INTERACT.

    Trans-Proteomic Pipeline
    http://tools.proteomecenter.org/TPP.php
    The TPP includes all steps of the ISB MS/MS analysis pipeline after results of database search: Peptide validation, Peptide quantitation, Protein identification, and Protein quantitation. XML-based, the TPP is easily extensible to additional search engines and analysis modules.

    Scaffold - Free Viewer
    http://www.proteomesoftware.com/Scaffold/Scaffold_viewer.htm
    View .sfd files created with the Scaffold MS/MS analysis platform from Proteome Software.

    PrestOMIC
    http://code.google.com/p/prestomic/wiki/PrestOMIC
    An open-source suite of tools for storing data in a PostgreSQL database and for presenting the data in a user-friendly format via a browser.

    Human ProteinpediA
    http://www.humanproteinpedia.org
    Human ProteinpediA is a community portal for sharing and integration of human protein data. It allows research laboratories to contribute and maintain protein annotations. Human Protein Reference Database (HPRD) integrates data, that is deposited in Human ProteinpediA along with the existing literature curated information at the context of an individual protein. All the public data contributed to Human ProteinpediA can be queried, viewed and downloaded.

    Microbial Protein-Protein Interaction Database - MiPPI
    http://mippi.ornl.gov
    A Research Program for Identification and Characterization of Protein Complexes

    Integrated Post-Genomic Data Resource
    http://www.proteomicsresource.org/
    The Biodefense Proteomics Catalog is a searchable directory of data and reagent information generated by the Proteomics Research Centers and available to the research community. A search yields information about organism studies, experimental data, research protocols, research reagents, identified proteins, publications and proteomes. It includes visualization and analysis tools.

    Mascot File Parsing and Quantification
    http://mfpaq.sourceforge.net

    MFPaQ (Mascot File Parsing and Quantification) is a new software developed at the IPBS (Institut de Pharmacologie et de Biologie Structurale, Toulouse, France) proteomics platform and dedicated to parse, validate, and quantify proteomics data. It allows fast and user-friendly verification of Mascot result files, as well as data quantification from an experiment performed by isotopic labeling using either ICAT or SILAC methods.

    This new tool provides a convenient interface to retrieve Mascot protein lists, sort them according to Mascot scoring or to user-defined criteria based on the number, the score and the rank of identified peptides, and to validate the results. The software extracts quantitative data from raw files obtained by nanoLC-MS/MS, calculates peptide ratios, and generates a non-redundant list of proteins identified in a multi-search experiment with their calculated averaged and normalized ratio.

    It is based on three modules, the Mascot File Parser (MFP) Module, the quantification module and a third module designed for differential analysis, in which validated protein lists are compared. The input of the MFP module is a list of mascot .dat files and the input of the quantification module is a list of .wiff files generated by Analyst QS on a Qstar instrument (coming soon: version of the quantification module compatible with other mass spectrometers).

    The next section provides details about the downloading and installation processes.

    Suggested Citation: David Bouyssi?, Anne Gonzalez de Peredo, Emmanuelle Mouton, Renaud Albigot, Lucie Roussel, Nathalie Ortega, Corinne Cayrol, Odile Burlet-Schiltz, Jean-Philippe Girard, and Bernard Monsarrat. "MFPaQ, a new software to parse, validate, and quantify proteomic data generated by ICAT and SILAC mass spectrometric analyses: application to the proteomic study of membrane proteins from primary human endothelial cells." Molecular and Cellular Proteomics, May 2007; doi:10.1074/mcp.T600069-MCP200

    Quant
    http://www.biomedcentral.com/1471-2105/8/214

    Mass spectrometry based quantification of peptides can be performed using the iTRAQ reagent in conjunction with mass spectrometry. This technology yields information about the relative abundance of single peptides. A method for the calculation of reliable quantification information is required in order to obtain biologically relevant data at the protein expression level.

    Data Extraction Tools

    RAP, JRAP, RAMP
    http://tools.proteomecenter.org/rap.php
    mzXML data parsers.

    T2Extractor 2.0
    http://www.proteomecommons.org/archive/1114637208624/
    A JDBC-based tool that can access the ABI 4700's database and can extract T2D files as well as peak lists.

    File Format Manipulation

    Peak List Conversion Utility (Java Web Start)
    http://www.proteomecommons.org/current/531/ConvertPeakList.jnlp
    The ProteomeCommons.org IO Framework's tool for converting peak list and spectrum files between different formats. The tool can also merge multiple peak lists in to a single concatinated peak list. The tools uses Java Web Start and runs locally on your computer.

    Suggested Citation: Jayson A. Falkner, Jarret W. Falkner, and Philip C. Andrews, "ProteomeCommons.org IO Framework: reading and writing multiple proteomics data formats", Bioinformatics, doi:10.1093/bioinformatics/btl573

    mzXML2Other
    http://www.proteomecommons.org/current/522/
    Converter from mzXML to sequest dta, mascot generic and micromass pkl formats. This is based on Xerces-c parser

    ReAdW
    http://tools.proteomecenter.org/ReAdW.php
    ThermoFinnigan Xcalibur format to mzXML converter.

    MassWolf
    http://tools.proteomecenter.org/MassWolf.php
    Micromass MassLynx format to mzXML converter.

    mxStar
    http://tools.proteomecenter.org/mzStar.php
    SCIEX/ABI Analyst format to mzXML converter.

    Prototype C-Sharp Converter
    http://tools.proteomecenter.org/PrototypeCSharpConverter.php
    Driven in large part by recent rapid advances in proteomics, the need for a vendor-independent means of accurate and robust representation and exchange for mass spectroscopy data has become apparent. Two major formats have emerged: mzXML, developed at the Institute for Systems Biology (ISB) and highly integrated into the Trans-proteomic Pipeline (TPP) software tool chain, and mzData, developed by the HUPO Proteomics Standards Initiative (PSI) MS working group. Both the proteomics research community and instrument vendors would clearly benefit from a single standard. Recently, the PSI-MS group, the ISB, and instrument vendors collaborated to produce a draft specification for a unified data format, tentatively titled dataXML, with the intention of combining the best features of the mzXML and msData formats. Here, we present work towards an open-source reference implementation for converters from raw data to both the mzXML and dataXML formats, which could be extended to other formats as well.

    MS Search Engines

    Mascot Peptide Mass Fingerprint
    http://www.matrixscience.com/cgi/search_form.pl%3fFORMVER=2%26SEARCH=PMF
    A free on-line version of Matrix Science's Mascot MS fingerprint search engine. You can select from a variety of different sets of known protein sequences to search against.

    Aldente
    http://www.expasy.org/tools/aldente/
    Aldente is a tool to identify proteins from PMF data. This new, fast and powerful PMF tool uses the Hough transform to determine the mass spectrometer deviation, to realign the experimental masses and to exclude outliers. Some unique advantages and features: It extensively uses the Swiss-Prot annotations (PTM, alternative splicing, etc.) and it is completely interconnected with other ExPASy proteomics tools, offering the functionality of protein characterization as part of the identification procedure. The scores may be tailored by fully customizable parameters. Besides from the usual chemical amino acid modifications, it considers also any user-defined modifications, such as alkylation products on cysteine residues, with the possibility to define their contribution to the score. For more information about Aldente see the documentation: http://www.expasy.org/tools/aldente/help.html

    PROCLAME
    http://proclame.unc.edu/
    A Web-based application that uses whole-protein masses determined by mass spectrometry to identify putative co- and posttranslational proteolytic cleavages and chemical modifications. The protein cleavage and modification engine (PROCLAME) requires as input an intact mass measurement and a precursor identification based on peptide mass fingerprinting or tandem mass spectrometry.

    Genome-based Peptide Fingerprint Scanning (GFS)
    http://gfs.unc.edu/
    Genome-based fingerprint scanning (GFS) is a recently reported method developed by the Giddings lab (Giddings et al., 2003) that maps peptide mass fingerprint data directly to raw genomic sequence, enabling rapid, low-cost identification of proteins in genomes for which annotation is lacking. The program takes as input an experimentally obtained peptide mass fingerprint, scans a genome sequence of interest, and outputs the most likely regions of the genome from which the mass fingerprint is derived. The software first generates a theoretical mass list by translating the genome of interest in 6 reading frames (3 each on the forward and reverse strands) and digesting the resulting proteins in silico according to cleavage rules associated with the specified protease (trypsin in this case). The algorithm then finds matches (within a given mass tolerance) between these theoretical masses and the input experimental masses.

    Human ProteinpediA
    http://www.humanproteinpedia.org
    Human ProteinpediA is a community portal for sharing and integration of human protein data. It allows research laboratories to contribute and maintain protein annotations. Human Protein Reference Database (HPRD) integrates data, that is deposited in Human ProteinpediA along with the existing literature curated information at the context of an individual protein. All the public data contributed to Human ProteinpediA can be queried, viewed and downloaded.

    MSMS Search Engines

    TheGPM (Human)
    http://human.thegpm.org
    A web interface to the XTandem search engine that will search your MSMS data against known human protein sequences.

    TheGPM (Prokaryotes)
    http://bacteria.thegpm.org
    A web interface to the XTandem search engine that will search your MSMS data against known prokaryotic protein sequences.

    TheGPM (Plant UniGene)
    http://plant.thegpm.org
    A web interface to the XTandem search engine that will search your MSMS data against known plant protein sequences.

    TheGPM
    http://www.thegpm.org
    A web interface to the XTandem search engine that will search your MSMS data against known protein sequences.

    Suggested Citation: TANDEM: matching proteins with mass spectra, Robertson Craig and Ronald C. Beavis, Bioinformatics, 2004, 20, 1466-7.

    TheGPM (Xenopus)
    http://xenopus.thegpm.org
    A web interface to the XTandem search engine that will search your MSMS data against known Xenopus protein sequences.

    TheGPM (T. brucei)
    http://tb.thegpm.org
    A web interface to the XTandem search engine that will search your MSMS data against T. brucei protein sequences.

    Mascot MS/MS Ion Search
    http://www.matrixscience.com/cgi/search_form.pl%3fFORMVER=2%26SEARCH=MIS
    A free on-line version of Matrix Science's Mascot search engine. You can choose to search against a number of different known protein data sets.

    The Open Mass Spectrometry Search Algorithm (OMSSA)
    http://pubchem.ncbi.nlm.nih.gov/omssa/index.htm
    A free, public-domain MSMS search engine sponsored by the NCBI.

    Phenyx Web Interface (PWI)
    http://phenyx.vital-it.ch/pwi/login/login.jsp
    A free on-line version of GeneBio's software platform for the identification and characterization of proteins and peptides from MS data.

    PeptideProphet
    http://tools.proteomecenter.org/PeptideProphet.php
    A tool for validating peptide identifications made by tandem mass spectrometry (MS/MS) and database searching; probabilities are assigned to the peptide identifications made by programs like SEQUEST or MASCOT.

    ProteinProphet
    http://tools.proteomecenter.org/ProteinProphet.php
    A statistical model for validation of peptide identifications at the protein level.

    InsPecT: high-throughput identification of peptide mass spectra
    http://peptide.ucsd.edu/inspect.html
    InsPecT performs high-throughput identification of peptide mass spectra with an emphasis on efficiently and confidently identifying modified peptides. Modifications include in vivo post-translational modifications such as phosphorylation, as well as in vitro chemical damage. We are able to search and score a broad range of modifications in a single search, or even identify unanticipated changes such as point mutations.

    VEMS 3.0
    http://yass.sdu.dk/
    VEMS Virtual Expert Mass Spectrometrist is a program for integrated proteome analysis. VEMS offers processing of raw data, MSMS database searches with many variable modifications, correlation of spectra for finding more hits, validation functions, data mining functions, quantitative time studies both with and without stable isotope labelling, clustering of quantitative results and export of data in Proteios XML standard format. The functions are included in one user friendly executable file. The VEMS application also works as an interface to other programs such as blast, Lutefisk and a sequence analysis tool.

    Sequit! 4.0 Online
    http://www.proteomefactory.com/sub/s505-Sequit.htm
    Sequit! 4.0 is a De Novo Peptide Sequencing Tool using MSMS fragment spectra for peptide and protein identification. It is online available as well as a commercial local version which runs under Windows 2000/XP. The local version allows batch analyses of MSMS spectra and automatic submission to BLAST search.

    Pep_Probe
    http://bart.scripps.edu/public/search/pep_probe/search.jsp
    An MSMS search engine based from the Yates lab described in the following publications. Central Limit Theorem as an Approximation for Intensity-Based Scoring Function Sadygov, R.; Wohlschlegel, J.; Park, S. K.; Xu, T.; Yates, J. R., III Anal. Chem.; (Article); 2006; 78(1); 89-95. A Hypergeometric Probability Model for Protein Identification and Validation Using Tandem Mass Spectral Data and Protein Sequence Databases Sadygov, R. G.; Yates, J. R., III Anal. Chem.; (Article); 2003; 75(15); 3792-3798.

    Popitam
    http://www.expasy.org/tools/popitam/
    Popitam, a new tool available from the ExPASy proteomics website, identifies and characterizes peptides with unexpected modifications (e.g. mutations or post-translational modifications) from peptide fragment fingerprinting (MS/MS) data.

    Scaffold from Proteome Software
    http://www.proteomesoftware.com/Proteome_software_prod_Scaffold.html
    Scaffold is a post MS/MS search tool, but has X! Tandem, an open MS/MS search tool imbedded in it. With Scaffold you can easily see your data from multiple points of view, compare samples easily, and with our free viewer, share and publish your results with ease.

    PEAKS Online
    http://www.bioinformaticssolutions.com:8080/peaksonline/
    PEAKS confidently identifies proteins with the aid of a protein sequence database. The PEAKS approach to protein database searching is unique; it differs from conventional approaches in that it uses de novo sequences to help out in the protein identification process. This approach is shown to have equivalent result quality as compared to the leading traditional approach, but with fewer false positives. Most find it useful for dealing with non-standard samples.

    Genome Annotating Proteomic Pipeline (GAPP)
    http://www.gapp.info/
    GAPP - a totally automated publicly available software pipeline for the identification of peptides and proteins from human proteomic tandem mass spectrometry data. Feel free to browse the results repository or register to automatically process your own data.

    Insilicos Proteomics Pipeline (IPP)
    http://www.insilicos.com/IPP.html
    IPP is a proteomics data analysis platform that contains tools for protein identification and quantification developed form the Institute for System Biology's (ISB) Trans-Proteomic Pipeline (TPP). IPP includes PeptideProphet, ProteinProphet, ASAPRatio, XPRESS and Libra in a software package optimized for performance, ease of use and quick results.

    Suggested Citation: Insilicos LLC, Seattle, WA

    PANORAMICS
    http://pubs.acs.org/cgi-bin/abstract.cgi/ancham/2007/79/i10/abs/ac070202e.html
    A peptide identification tool that uses a probability model for determining the likelihood that peptides are correctly assigned to proteins. This model derives consistent probability estimates for assembled proteins. The probability scores make it easier to confidently identify proteins in complex samples and to accurately estimate false-positive rates.

    Suggested Citation: Jian Feng, Daniel Q. Naiman, and Bret Cooper, "Probability Model for Assessing Proteins Assembled from Peptide Sequences Inferred from Tandem Mass Spectrometry Data", Anal. Chem., 79 (10), 3901 -3911, 2007. 10.1021/ac070202e S0003-2700(07)00202-8

    Mass Spectrometry of Polymers

    GNU polyxmass project
    http://www.polyxmass.org/
    This is a feature-rich software suite for predicting and analyzing mass spec data on any polymer sequence of any polymer type.

    Other

    Wildcat Toolbox: Perl scripts for mass spectral data analysis and sorting
    http://proteomics.arl.arizona.edu/perl.html
    A collection of perl scripts for manipulating fasta files and mass spectral data, particularly DTA files.

    GPMDB-US
    http://gpmdb-us.thegpm.org
    This is a US mirror of the GPMDB server. The GPMDB server is a data repository for the GPM servers. Protein identifications and MSMS to peptide matches can be compared to previously identified data.

    GPMDB-based Include List Generator (Click to Run)
    http://www.proteomecommons.org/current/559/IncludeListGenerator.jnlp
    This is a tool originally developed for aiding in MALDI TOF/TOF data post processing. For weakly identified proteins, e.g. one hit wonders, we wanted a way to generate an included list of all other peptides that should be identified for the weakly identified protein. The masses for those peptides would then be made in to an include list, and the MALDI plate would be reanalyzed looking for those specific peptides. The include list is nothing more than a plain text file with the masses of the expected peptides. The trick to this tool is having a good way to find other peptides that should be seen. To do this we mine information about what peptides others have observed directly from the GPMDB. Additionally, the tool allows you to restrict peptides based on give mass ranges or theoretical tryptic peptides. The tool also allows you to modify peptides using any potential modification or any arbitrary mass shift.

    Make2D-DB II
    http://www.expasy.org/ch2d/make2ddb/
    Make2D-DB II is a tool to create, convert, interconnect and keep up-to-date 2-DE databases. It has been designed to ensure high consistency of data. It allows dynamic interconnection between any numbers of similar remote databases and offers many other features, including automatic data updates related to external data, dynamic cross-references to similar databases, intuitive search engine and data visualization combined with exports in various formats.

    Illinois Bio-Grid Mass Spec Toolkit
    http://gridweb.cti.depaul.edu/twiki/bin/view/IBG/MassSpecToolkit
    This project entails developing solutions for anaylsis of Mass Spectrometry data. This includes analysis tools such as SpectralMatch and HDXRates but also a set of C libraries which gives IO and data structure support to developers. Most of the current code is written in C but we do have some functionality written in Java which correlates to our IBG Desktop project.

    MassSieve: A New Tool for Mass Spectrometry-Based Proteomics
    http://www.proteomecommons.org/dev/masssieve/

    The success of peptide sequence assignment algorithms such as OMSSA and Mascot for mass spectrometry has led to the need for a tool to evaluate the results. DBParser is such a software tool, previously developed by the Laboratory of Neurotoxicology (LNT) lab for this purpose. Its value for parsimonious analysis of proteins associated with experiments has led to its use for analyzing larger datasets than initially anticipated (hundreds of data files with millions of spectra). MassSieve builds on this experience and is designed as open-source protein assignment software that can be scaled to apply parsimony principles to very large experiments without dataset size limitations. In addition it allows a more interactive view of the results.

    Adipose Proteome Database
    http://proteome.biochem.mpg.de/adipo
    In our 3T3-L1 adipocyte proteome database, we provide protein and peptide list of the adipocyte with their subcellular localization information. Using the database, it is easy to search proteins or peptides by IPI accession number, protein name, description, subcellular location and peptide sequence, and also do batch search.

    Organellar Map Database
    http://proteome.biochem.mpg.de/ormd/
    In our Organellar Map Database, we provide protein and peptide list with organellar map information. Use the database, it is easy to search proteins or peptides by IPI accession number, Uniprot accession number, protein name, description, location and peptide sequence, and also do batch search.

    Body Fluid Database - Seminal
    http://proteome.biochem.mpg.de/seminal/
    In our bodily fluid protein database, we provide protein and peptide list of cerebrospinal fluid (CSF), urine, tear and seminal plasma. All of our data were obtained by a state of the art mass spectrometer, a linear ion trap-Fourier transform instrument (LTQ-FT) or a linear ion trap-Orbitrap (LTQ-Orbitrap) with extremely high mass accuracy. You can search proteins or peptides by protein ID (IPI), protein name and peptide sequence across four bodily fluids.

    Body Fluid Database - Tear
    http://proteome.biochem.mpg.de/tear/
    In our bodily fluid protein database, we provide protein and peptide list of cerebrospinal fluid (CSF), urine, tear and seminal plasma. All of our data were obtained by a state of the art mass spectrometer, a linear ion trap-Fourier transform instrument (LTQ-FT) or a linear ion trap-Orbitrap (LTQ-Orbitrap) with extremely high mass accuracy. You can search proteins or peptides by protein ID (IPI), protein name and peptide sequence across four bodily fluids.

    Body Fluid Database - Urinary
    http://proteome.biochem.mpg.de/urine/
    In our bodily fluid protein database, we provide protein and peptide list of cerebrospinal fluid (CSF), urine, tear and seminal plasma. All of our data were obtained by a state of the art mass spectrometer, a linear ion trap-Fourier transform instrument (LTQ-FT) or a linear ion trap-Orbitrap (LTQ-Orbitrap) with extremely high mass accuracy. You can search proteins or peptides by protein ID (IPI), protein name and peptide sequence across four bodily fluids.

    Red Blood Cell Database
    http://proteome.biochem.mpg.de/rbc/
    Our Human Red Blood Cell Database (hRBCD) is divided into membrane and soluble proteins, providing information ranging from identification of specific isoforms to the class and metabolic status of identified proteins. It also provides information on the biochemical characteristics of the membrane proteins and related statistical peptide information. hRBCD can be used in the process of confirming the presence of a protein in the red blood cell and to obtain further information on specific proteins and their biochemical behavior. A range of search possibilities are offered to get the most information out of the database: IPI accession number, protein name, description and class. Batch searches can also be undertaken

    GlycoPro from BioPharmaSoft
    http://www.biopharmasoft.com/

    A software tool, GlycoPro, is available now for automatic, high throughput glycopeptide MS/MS spectra interpretation. It searches */_thousands_/* of MS/MS spectra from a glycoprotein digest against an N-glycan structure library with over 140,000 entries, scores each hit, and displays a graphical drawing of the N-glycan fragment for each of the matched MS/MS peak. All is done in an hour or two on a personal computer. Included in the N-glycan structure library/database are acetylated, phosphorylated and sulfated N-glycans (based on biosynthetic rules). Also available is an even larger library with over 500,000 entries. It may represent a breakthrough in glycomics. It is a practical tool for researchers interpreting MS/MS spectra manually (for N-glycan structure determination). For more info: www.BioPharmaSoft.com

    GlycoPro takes an MS2 data folder name and the average MH+ of the peptide as its input and automatically finds any N-glycans attached to the peptide. The MS2 spectra data (in DTA format) can be ALL those that are obtained from an LC-MS/MS run of a tryptic digest of one or a mixture of a couple of glycoproteins, under the same experimental conditions as in a typtical proteomics experiment using mass spectrometry.

    ProteoWizard
    http://proteowizard.sourceforge.net

    ProteoWizard is a modular and extensible set of open-source, cross-platform tools and software libraries for proteomics data analysis.

    The libraries enable rapid tool creation by providing a robust, pluggable development framework that simplifies and unifies data file access, and performs standard chemistry and LCMS dataset computations.

    The software is available for download now, under the Apache open source license.

    Features

    Tools



    Comments or Questions? Please contact the site's administrators.