Difference between revisions of "Proteomics:Data Format Meeting on 2010-02-19"

From BioAssist
Jump to: navigation, search
m (added links)
Line 1: Line 1:
 
== Meeting detail ==
 
== Meeting detail ==
 +
 
Date: 19 February 2010
 
Date: 19 February 2010
 
Location: University of Groningen, Analytical Biochemistry, Antonious Deusinglaan 1, Groningen.
 
Location: University of Groningen, Analytical Biochemistry, Antonious Deusinglaan 1, Groningen.
Line 7: Line 8:
 
* [[User:Richard Scheltema]] ([[RUG]])
 
* [[User:Richard Scheltema]] ([[RUG]])
 
* [[User:Mswertz|Morris Swertz]] ([[RUG]])
 
* [[User:Mswertz|Morris Swertz]] ([[RUG]])
* [[User:Horvatovich|Peter Horvatovich]] ([[RUG]])
+
* [[User:Horvatovich|Péter Horvatovich]] ([[RUG]])
  
 
== Summary ==
 
== Summary ==
Line 13: Line 14:
 
What is the definition of peak list? For Richard it should contain the extracted ion chromatogram of the peaks. Richard peak list contain the raw data about the peak (extracted ion chromatograms). He use centroided data and would like to add profile data. He has index table to have quick access to peak extracted ion chromatograms.
 
What is the definition of peak list? For Richard it should contain the extracted ion chromatogram of the peaks. Richard peak list contain the raw data about the peak (extracted ion chromatograms). He use centroided data and would like to add profile data. He has index table to have quick access to peak extracted ion chromatograms.
  
Richard was suggesting to use CORRA<ref name="Corra">1. Brusniak, M. Y.; Bodenmiller, B.; Campbell, D.; Cooke, K.; Eddes, J.; Garbutt, A.; Lau, H.; Letarte, S.; Mueller, L. N.; Sharma, V.; Vitek, O.; Zhang, N.; Aebersold, R.; Watts, J. D. BMC Bioinformatics Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics 1. 2008, 9, 542.</ref> as starting point as it is open source program and it has all nearly component what the platform need. In Corra there is two open source quantitative data processing software integrate such as Specarray and Supernhirn with subsequent Bioconductor based statistical analysis modules implemented in R. CORRA provide user friendly web page and execution of integrated modules on local cluster. We agreed to use APML for peak list and aligned peak matrix format and convert the format of all other tools to it. We should develop further the APML format to extend to all properties that we need preferably in collaboration with the format authors. We should also convert the output format of all [[DAF]] integrated tools into APML. For that we need to delegate development task to module, which is able to write parsers and converters between the different output formats of the integrated tools and APML. This taskforce will be also responsible to investigate how it is possible to extend and further develop APML e.g. by adding extracted ion chromatograms of peaks as it used by Richard etc. George and Isthiaq should also investigate the Corra project and source code to see if parts/modules can be used in DAF for e.g. workflow execution or job running/monitoring on local clusters.
+
Richard was suggesting to use [[Corra]]<ref name="Corra">1. Brusniak, M. Y.; Bodenmiller, B.; Campbell, D.; Cooke, K.; Eddes, J.; Garbutt, A.; Lau, H.; Letarte, S.; Mueller, L. N.; Sharma, V.; Vitek, O.; Zhang, N.; Aebersold, R.; Watts, J. D. BMC Bioinformatics Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics 1. 2008, 9, 542.</ref> as starting point as it is open source program and it has all nearly component what the platform need. In [[Corra]] there is two open source quantitative data processing software integrate such as Specarray and Supernhirn with subsequent Bioconductor based statistical analysis modules implemented in R. [[Corra]] provides user friendly web page and execution of integrated modules on local cluster. We agreed to use [[APML]] for peak list and aligned peak matrix format and convert the format of all other tools to it. We should develop further the [[APML]] format to extend to all properties that we need preferably in collaboration with the format authors. We should also convert the output format of all [[DAF]] integrated tools into [[APML]]. For that we need to delegate development task to module, which is able to write parsers and converters between the different output formats of the integrated tools and [[APML]]. This taskforce will be also responsible to investigate how it is possible to extend and further develop [[APML]] e.g. by adding extracted ion chromatograms of peaks as it used by Richard etc. [[User:George|George]] and [[User:Ishtiaq|Ishtiaq]] should also investigate the [[Corra]] project and source code to see if parts/modules can be used in [[DAF]] for e.g. workflow execution or job running/monitoring on local clusters.
  
We have designed with concrete tools the first part of the generic proteomics workflow (see the chart at Figure 1). This workflow will only process MS/MS raw acquired with data dependent acquisition with goal to provide list of peptide/proteins quantity in different samples. This will be the first pipeline, which can be used later on to extend to a pipeline where MS/MS data are in separate files than single stage MS data used for quantification.
+
We have designed with concrete tools the first part of the generic [[proteomics pipelines|proteomics workflow]] (see the chart at Figure 1). This workflow will only process MS/MS raw acquired with data dependent acquisition with goal to provide list of peptide/proteins quantity in different samples. This will be the first pipeline, which can be used later on to extend to a pipeline where MS/MS data are in separate files than single stage MS data used for quantification.
  
For that we have TAPP workflow with 4 modules. The open source OMSSA as identification program. We need to write a module, which integrate identification with the quantification information. The programmer of Twan America has experience with that and has similar module integrating MSE experience. Task force to make module parsing the output of TAPP workflow to APML is required. Peter Horvatovich has to provide on the ftp server of tool description input and output files and parameters for the 4 modules of TAPP workflow without providing the tools (based on agreement with IBM). Twan will make a more detailed schema of the first and second workflow that we intend to implement.
+
For that we have TAPP workflow with 4 modules. The open source [[OMSSA]] as identification program. We need to write a module, which integrate identification with the quantification information. The programmer of [[User:TwanAmerica|Twan]] has experience with that and has similar module integrating MSE experience. Task force to make module parsing the output of TAPP workflow to [[APML]] is required. [[User:Horvatovich|Péter]] has to provide on the ftp server of tool description input and output files and parameters for the 4 modules of TAPP workflow without providing the tools (based on agreement with IBM). [[User:TwanAmerica|Twan]] will make a more detailed schema of the first and second workflow that we intend to implement.
  
 
[[Image:Raw_scheme_I_workflow.png]]
 
[[Image:Raw_scheme_I_workflow.png]]
  
'''Figure:''' Schematic representation of the modules of the first workflow providing annotated quantitative peak matrix in APML from MS/MS data acquired in data dependent mode.
+
'''Figure:''' Schematic representation of the modules of the first workflow providing annotated quantitative peak matrix in [[APML]] from MS/MS data acquired in data dependent mode.
  
 
== Task list ==
 
== Task list ==
  
* Twan Amrica: provide detailed scheme of the first quantitative qulitative workflow with dedicated program
+
* [[User:TwanAmerica|Twan]]: provide detailed scheme of the first quantitative quantitative workflow with dedicated program
* Peter Horvatovich: provide input, output and parameters using a test file for the 4 modules of TAPP pipeline
+
* [[User:Horvatovich|Péter]]: provide input, output and parameters using a test file for the 4 modules of TAPP pipeline
* Twan Amrica and Joos has experience in matching peptide and protein identifications to MS1 quantitative data. They will develop a module for matching OMSSA results with TAPP quantitative oputput
+
* [[User:TwanAmerica|Twan]] and Joos has experience in matching peptide and protein identifications to MS1 quantitative data. They will develop a module for matching [[OMSSA]] results with TAPP quantitative output
  
 
== Reference List ==
 
== Reference List ==

Revision as of 22:08, 8 March 2010

Meeting detail

Date: 19 February 2010 Location: University of Groningen, Analytical Biochemistry, Antonious Deusinglaan 1, Groningen.

Participants

Summary

What is the definition of peak list? For Richard it should contain the extracted ion chromatogram of the peaks. Richard peak list contain the raw data about the peak (extracted ion chromatograms). He use centroided data and would like to add profile data. He has index table to have quick access to peak extracted ion chromatograms.

Richard was suggesting to use Corra[1] as starting point as it is open source program and it has all nearly component what the platform need. In Corra there is two open source quantitative data processing software integrate such as Specarray and Supernhirn with subsequent Bioconductor based statistical analysis modules implemented in R. Corra provides user friendly web page and execution of integrated modules on local cluster. We agreed to use APML for peak list and aligned peak matrix format and convert the format of all other tools to it. We should develop further the APML format to extend to all properties that we need preferably in collaboration with the format authors. We should also convert the output format of all DAF integrated tools into APML. For that we need to delegate development task to module, which is able to write parsers and converters between the different output formats of the integrated tools and APML. This taskforce will be also responsible to investigate how it is possible to extend and further develop APML e.g. by adding extracted ion chromatograms of peaks as it used by Richard etc. George and Ishtiaq should also investigate the Corra project and source code to see if parts/modules can be used in DAF for e.g. workflow execution or job running/monitoring on local clusters.

We have designed with concrete tools the first part of the generic proteomics workflow (see the chart at Figure 1). This workflow will only process MS/MS raw acquired with data dependent acquisition with goal to provide list of peptide/proteins quantity in different samples. This will be the first pipeline, which can be used later on to extend to a pipeline where MS/MS data are in separate files than single stage MS data used for quantification.

For that we have TAPP workflow with 4 modules. The open source OMSSA as identification program. We need to write a module, which integrate identification with the quantification information. The programmer of Twan has experience with that and has similar module integrating MSE experience. Task force to make module parsing the output of TAPP workflow to APML is required. Péter has to provide on the ftp server of tool description input and output files and parameters for the 4 modules of TAPP workflow without providing the tools (based on agreement with IBM). Twan will make a more detailed schema of the first and second workflow that we intend to implement.

Raw scheme I workflow.png

Figure: Schematic representation of the modules of the first workflow providing annotated quantitative peak matrix in APML from MS/MS data acquired in data dependent mode.

Task list

  • Twan: provide detailed scheme of the first quantitative quantitative workflow with dedicated program
  • Péter: provide input, output and parameters using a test file for the 4 modules of TAPP pipeline
  • Twan and Joos has experience in matching peptide and protein identifications to MS1 quantitative data. They will develop a module for matching OMSSA results with TAPP quantitative output

Reference List

  1. 1. Brusniak, M. Y.; Bodenmiller, B.; Campbell, D.; Cooke, K.; Eddes, J.; Garbutt, A.; Lau, H.; Letarte, S.; Mueller, L. N.; Sharma, V.; Vitek, O.; Zhang, N.; Aebersold, R.; Watts, J. D. BMC Bioinformatics Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics 1. 2008, 9, 542.