Proteomics data formats
Revision as of 23:33, 8 March 2010 by Dmitry Katsubo
Raw data level
- Thermo Xcalibur or Waters MassLynx .RAW files
The file formats are different, but distinguishable by: Xcalibur=file, MassLynx=directory. Original data file as exported by Xcalibur or MassLynx respectively during data acquisition. These files can programmatically be accessed on the Microsoft Windows platform through a OLE DLL. Normally this will only work with a C++ implementation, however the PeakML library (due to be released open source; currently available on request firstname.lastname@example.org) provides a 1-to-1 mapping for accessing the data for Java implementations. It is advisable to use these original formats, as there is a large amount of information contained in these files, which is not mapped to an open file format like mzML.
- Agilent .wiff files
Peak list level
- mzML (mzML) (also see mzsquash (compression tool for mzML files) or fast infoset)
- mzXML (mzXML)
- APML – XML interchange format, used in Corra.
Peptide identification level
Was developed by Richard Scheltema as response to the needed to have the ability to store intermediate data (extracted mass traces, matched sets of these, parameters, etc.), in order to create a modular pipeline setup.
- NetCDF (obsolete)
It was developed to be general purpose and as such is a very poor fit for mass spec data. This means it will miss much useful information on your mass spec run. Do not use it.
- Mascot .dat
- Mascot HTML
- Mascot CSV
- Mascot pepXML
- OMSSA .omx
- OMSSA CSV
- InsPecT CSV