Proteomics Practical Corra Day (1)
Agenda for the first Practical Corra Day of the proteomics task force
SURF Foundation, Utrecht
Address and route: http://www.surffoundation.nl/en/Pages/Contact.aspx
Date and Time
July 2nd, 2010. 10:00-15:30. Lunch included
- Rob Hooft (NBIC - Bioassist), confirmed
- Pieter Neerincx (UU), confirmed
- Peter Horvatovich (RUG, proteomics expert), confirmed
- Twan America (WUR), confirmed
- Perry Moerland (AMC), confirmed
- Morris Swertz (RUG, programming and database expert), confirmed
- Thang Pham (VUmc), confirmed
- Berend Hoekman (RUG, proteomics and programming expert), confirmed by self-addition
- Carsten Byrman (AMC), confirmed
- Isthiaq Ahmad (RUG, programming expert), confirmed
- George Byelas
|10:00||Welcome and Purpose|
|10:05||Agenda and Ground Rules|
Practical introduction, produce groups (each group with >=1 programmer and >=1 field expert)
Study Corra code in groups, identify areas we would want to enhance
Discussion of findings in the groups
Make list of questions for Mi-Youn Brusniak
- User login
- Installation procedure and dependencies
- File upload, DAF relation?
- Possibility to integrate with MOLGENIS tools?
- discussion on APML format
- how APML format is suitable for integration other tools
- how main structure of APML is suitable for the generic proteomics pipeline
- how it is possible to manage large number of attributes (e.g. by classifying them as obligatory, often used or application specific)
- how to include data in APML (e.g. direct incorporation of data, via linking external files or mixed structure)
- how APML and Corra is related to openBIS developped in Zürich at ETH and to Trans Proteomics Pipeline at the Seattle Proteome Center?
(This was also sent to Mi-Youn Brusniak)
We have held our Corra and APML evaluation day last week Friday. We came up with the conclusion that the Corra framework and APML are both partially corresponding to our needs. For Corra we see hurdles that make it unwise for us to use the framework as it is now. On the other hand we will start using APML for our data interchange, in combination with the developing proteomics standards formats.
Our opinion is that the framework is biased to the data processing pipeline implemented (Specarray and SuperHirn), and that this can lead to difficult integration of new tools. Also, we had quite some difficulties with the installation of Corra as apparently dependencies are not all described in the installation manual. We therefore did not manage to run the R statistics part. Another inconvenience that we found is that it appears to be impossible to specify the parameters for the full work flow at once, so that user intervention is required after every step. Rather than diving into the Corra framework very deeply to solve these issues now, we have decided to start implementing our tools in Galaxy first. We can benefit from experience with Galaxy with some people in our group, and also in our sister group implementing genomics pipelines.
In the discussion many good things about APML were raised. The other tools we would like to use may have other attributes, but it appears these could be either converted to APML equivalent (such as sigma of peak in LC dimension) or may be added. APML has the structure of generic proteomics pipeline containing quantification and compound annotation information. It lacks the support of the PSI, but is much further along in its definition. For that reason we have decided to use APML as starting point and modify it in the future by adding new attributes and/or (not yet decided) turning certain type to HUPO PSI standard such as identXML, quantXML (discussion is currently ongoing at EBI).