Proteomics:Programmers Meeting on 2010-10-15
Location and time
- Location: Hojel City Center, gebouw D (5th floor), Graadt van Roggenweg 340, 3531 AH Utrecht.
- Time: 9:00-10:30
- Berend Hoekman (RUG)
- Carsten Byrman (AMC)
- Don de Lange (EMC)
- George Byelas (RUG)
- Henk van den Toorn (UU)
- Joost de Groot (WUR)
- Péter Horvatovich (RUG)
- Pieter Neerincx (UU)
- Rob Hooft (NBIC)
- Wim Spee (UU)
- Freek de Bruijn (NBIC)
- What are we building, what will we need to show and where are the holes
- Parking lot
Rob: We are building tools and the glue to connect them. The PIs in our task force are very involved and approachable, which is a big advantage compared to other groups. Another difference is that in other task forces, there is a clear separation between research and tool development. In the research groups that collaborate in the proteomics task force, this distinction is not made. Since BioAssist should focus on tool and glue development only, this is something we need to keep in mind.
I want to talk about the following questions during this meeting:
- What are we building?
- What will we need to show?
- Where are the holes?
What are we building, what will we need to show and where are the holes
Péter: Our goal is to provide a complete pipeline for web lab scientists that is easy to use. DAF is being used in Groningen and is well suited for these users. Galaxy is not usable for this purpose, since the user interface is too complex. George is also working on an alternative system that provides a simple web interface.
Carsten: Why do we want to use Galaxy when the usability is too low?
Rob: Galaxy is also intended to be used between developers. There a different paths possible in the pipeline. There is a standard path with the default components. Groups specialized in different parts of the pipeline can replace specific components and create alternative paths.
Péter: It is also possible to convert Galaxy workflows to different, more user friendly alternatives. We are working on conversions from Galaxy to Molgenis, DAF and the system George is working on. This will start simple and we can add more complex features later (like identification using a collection of tools).
Pieter: How will this work with proprietary systems like the one from IBM? Another issue is that labeled quantification is still missing.
Péter: Which methods are you using in Utrecht? Maybe the difference to label free is not so large?
Rob: Utrecht will not be in the first version of the pipeline, but will be building compatible tools and join later. Some label free tools can be used already.
Péter: Another bottleneck is the development of tools.
George: Addition of a tool to a system is no problem. Testing the complete pipeline after this costs a lot of time and also requires usable test data.
Pieter: Is Galaxy not user friendly enough already? Some functionality like adding loops to a workflow is missing, but Galaxy is certainly usable.
Rob: We can develop a simplified front end for Galaxy.
Pieter: Another area in which a lot of user interface improvement is possible is in the parameters that need to be specified for tools. Probably only 10 % of these parameters are really useful, but we do not take the time to fix these problems. It is also possible to extract more information from the input data, like for example the machine name.
Rob: Some Galaxy fixes needed. Limiting “options” in GUI/CLI. “Guess” the rest: automatically determine parameters.
Wim: When can bio-informaticians test with the Galaxy server?
Rob: The NBIC Galaxy server is available and will probably be extendable with BigGrid (SARA servers in grids/clusters). Another option is to use a virtual machine with Galaxy pre-installed, which makes it possible for research groups to get up and running with Galaxy fast and use their own servers. Pieter and David van Enckevort (NBIC) are almost done making this possible.
Péter: DAF could also be used in Galaxy as a grid back-end.
Pieter: Besides computing power, Galaxy also requires a lot of data storage. The philosophy of Galaxy is to store everything for ever. We could limit storage to 0.5 to 1 year, throw intermediate results away and use deduplication to avoid storing the same data multiple times.
Don: I am planning to work one day a week in Utrecht. We are working out the details, but the desk is already there.
Carsten: It would be nice if there would be room during this meeting for other things besides talking. The meeting could be more informal and have a flexible agenda.
Rob: We really need this monthly meeting to discuss as a group. It is encouraged to make arrangements to meet with other members of our task force. These meetings could be monthly or more frequent. The colleagues from the metabolomics task force write software together one day a week, but they live closer together. A weekly frequency might be too much for our group, but we can look at what is possible.
Pieter: When the cooperation is very flexible, it will be difficult to schedule. Since the scientific programmers' meetings are scheduled a year in advance, it is easier to continue doing.
Pieter: We will be working on quantitation for labeled stuff for the coming months.
Henk: Maybe having one GForge project for all of our Galaxy tools is not ideal. (It will be difficult to separate them, so we will leave it as is for now.)
Berend: When will the virtual image with the Galaxy be available? You can ask David van Enckevort (NBIC), since he is currently working on it.
Joost: I am currently getting up to speed again.