Proteomics data formats

From BioAssist
Jump to: navigation, search

Data formats

Raw data level

  1. Thermo Xcalibur or Waters MassLynx .RAW files
    The file formats are different, but distinguishable by: Xcalibur=file, MassLynx=directory. Original data file as exported by Xcalibur or MassLynx respectively during data acquisition. These files can programmatically be accessed on the Microsoft Windows platform through a OLE DLL. Normally this will only work with a C++ implementation, however the PeakML library (due to be released open source; currently available on request r.a.scheltema@rug.nl) provides a 1-to-1 mapping for accessing the data for Java implementations. It is advisable to use these original formats, as there is a large amount of information contained in these files, which is not mapped to an open file format like mzML.
  2. Agilent .wiff files
PeakList format and software.png

Peak list level

  1. .mgf
  2. .dta
  3. .pkl
  4. mzML (mzML[1]) (also see mzsquash (compression tool for mzML files) or fast infoset)
  5. mzXML[2] (mzXML[1])
    • RAMP is a mzXML C/C++ parser
    • jmzML is a JAXB-based implementation of the full mzML 1.1 standard format
    • JRAP is the port of RAP to Java, which includes SAX2 and StAX parser (download)
  6. APML – XML interchange format, used in Corra.

Peptide identification level

  1. pepXML[2]
  2. protXML[2]
  3. mzIdentML[1]
  4. PeakML
    Was developed by Richard Scheltema as response to the needed to have the ability to store intermediate data (extracted mass traces, matched sets of these, parameters, etc.), in order to create a modular pipeline setup.
  5. NetCDF (obsolete)
    It was developed to be general purpose and as such is a very poor fit for mass spec data. This means it will miss much useful information on your mass spec run. Do not use it.
  6. analysisXML
  7. prideXML?
  8. Mascot .dat
  9. Mascot HTML
  10. Mascot CSV
  11. Mascot pepXML
  12. TandemXML
  13. OMSSA .omx
  14. OMSSA CSV
  15. InsPecT CSV

Reference List

  1. 1.0 1.1 1.2 Taken from Proteowizard Formats
  2. 2.0 2.1 2.2 Taken from Proteomecenter Formats Overview