Proteomics:Programmers Meeting on 2010-05-21
Location and time
- Location: Hogeschool Domstad, Koningsbergerstraat 9, 3531 AJ Utrecht.
- Time: 9:00-10:30
- Rob Hooft (NBIC)
- Carsten Byrman (AMC)
- Ishtiaq Ahmad (RUG)
- Peter Horvatovich (RUG), via Skype.
- Pieter Neerincx (UU)
- Don de Lange (EMC)
- Twan America and Joost de Groot (WUR)
Corra server is installed on the virtual server in Groningen. This required more resources than originally allocated. Two quantitation tools are part of the installation, it is not clear how easy it will be to add our own.
- Installation is tedious, many dependencies
- Rudimentary user management [reinvention of the wheel]
- [Ed: Added later] Insecure and user-unfriendly file upload
We will need to find out how modular the code is, and whether we can replace some dedicated parts with standard libraries.
Twan will prepare a smaller test data set. Current run is 11 GB.
Carsten looked at the Corra R scripts. There are a few. Two APML parsers are available: a DOM and a SAX parser. The DOM parser is not really practical, due to the size of the data.
Pieter will setup the additional disk on the Corra server.
Pieter will make sure the R setup works inside Corra.
Pieter will get the test data set into Corra.
APML is redundant and verbose, but in theory it will fit our needs. Practice to be tested.
Carsten has already reprogrammed web scripts to take APML input. Works fine. Output graphs can not be put in APML. The web services currently have no way to pass the graphs on in a pipe line. This could be done using bitmap files. We need to figure out a way to embed the data that is underlying the graphs. Another part of the discussion focused on interactive graphics. There are no interactive graphics yet, but if they would be developed, it would require a rather "thick client" capable of sending back results (selected parameters) to the server for storage in the APML data.
Corra also includes some tab delimited files. Some data may not need to be stored in APML?
Alternatives to APML?
The PSI has approved MZML and MZidentML. Unfortunately, no vendor supports the latter yet. They are now discussing standardization of MZquantML. There is no MZ*ML for alignment yet that we know off.
Supporting all the MZ*ML would mean a lot more work for us. Instead we may start talks with PSI to push APML to be a standard. It may be possible to replace APML sections with MZ*ML pieces.
We will make an extended list on the wiki of all the tools we want to use with their Input and Output formats. This will help us establish which convertors we need. Peter has sent out a spreadsheet to collect the data.
We would like to sit together to Discuss with Mi-Youn Brusniak
Rob will try to set up a meeting during Mi-Youn's visit to Europe [Ed: This failed, Travel itinerary was finished without a visit to NL]
- Previous actions completed, new actions above.
- Pieter will ask Bas to plan a meeting after our meetup with Brusniak.
- Absent this meeting.
- Revived ProteoPipeline, a Java library for stream based proteomics data processing. The streaming works only for actions that are taken on a spectrum-by-spectrum basis. The data library is unique and valuable.
- Wim will talk to Bas about publication of this work
- Wim will get it into gForge
- A future action is to make this connect to the existing identification tools
- Web server enhancement has been almost completed.
- Login based on OAuth/OpenID combination. Allows logging in with MyExperiment credentials. This is received with applause, as this technology may solve a lot of problems with web services that require a login.
A number of the other action points were finished. This remains:
- Ishtiaq will:
- Make sure the application runs reliably on the grid UI
- Write a module to get the grid results back to the user
- Solve the registration service limitations
- Later: different disc quota dependent on the user
- Add DAF documentation to the Trac for Molgenis
Contact with Mi-Youn Brusniak was made.
- George is testing grid/cloud computing tools.
- Peter will test these
APML work is done, see above.
Existing actions still pending:
- Carsten will publish his workflow, but not as is. He will first split it up in smaller modules, and then put each on Biocatalog/myexperiment.
Don de Lange
Don had a look at APML as well, looks OK.
General action points
- Everyone will test APML in practice on the server.