From BioAssist
Revision as of 22:10, 3 February 2010 by Mswertz (Talk | contribs)

Jump to: navigation, search


New technological developments in proteomics research have lead to an explosive expansion of generated data. Data sets are getting larger and more complex as the number of samples increases, and individual spectra are growing dramatically in size as resolution improves and tandem mass spectrometry options are more fully used. This leads to urgent needs for further developments of proteomics-related bioinformatics for better management and interpretation of the generated data.

Netherlands Bioinformatics for Proteomics Platform (for short NBPP) is one of the task forces of BioAssist. The task force will build a platform to enable experts as well as non-experts to run typical proteomics analyses.

Strong collaboration between and within WP participants will be the central driving force of the NBPP, which will enable to use and integrate the different expertise of the contributing scientists. The developed algorithms will fill the current technological gaps and will be distributed to the proteomics community via the NBPP in a user-friendly way. NBPP will serve as a core analysis platform for users with little bioinformatics expertise. NBPP is open for all other national or international researchers and will provide the basis for related future developments. In that sense NBPP is an integrative project trying to give common platforms for proteomics related bioinfromatics developments, which is unique at international level. The goal of NBPP with the development of a fully integrated modular bioinformatics platform for proteomics is to become a competitive world-wide global player and leader in technological development of proteomics-related bioinformatics.

Project embedding and structure

The Netherlands have numerous excellent proteomics, bioinformatics and systems biology laboratories, and several groups are performing high-quality research in proteomics-related bioinformatics. However, this research field is still considered to be in its early stages compared to other research domains like genomics and transcriptomics. New analytical technologies are still emerging, and the bioinformatics research effort has to keep up with these rapid developments. Technology centres like the Netherlands Proteomics Centre aim at developing novel tools and methods to obtain faster and more reliable proteomics data. The present initiative "Bioinformatics for Proteomics" intends to supplement this initiative by creating a central forum to join and coordinate all national efforts for proteomics-related bioinformatics. To this end, the Netherlands Proteomics Centre (NPC) and the Netherlands Bioinformatics Centre (NBIC) have brought together a broad national consortium of bioinformatics groups (project leader Dr. Ir. Bas van Breukelen) and intend to finance 6 scientific positions, to address the observed major bottleneck of the NPC program as highlighted in the evaluation of NPC by the QANU-coordinated international science assessment in March 2007. To organize the research program, a committee of experts (see next chapter) in the field was formed, which outlined the aim of establishing a core group of bioinformatics research laboratories in the Netherlands with sufficient critical mass and expertise to create a Netherlands Bioinformatics for Proteomics Platform (NBPP). This platform will be closely integrated with proteomics-related bioinformatics projects of the NBIC BioRange and BioAssist, and NPC projects that are currently in progress. The main aim is the rapid and standardized deployment of powerful cutting-edge tools and functionality to the wider proteomics community. For maximal computational capacity, the platform will use the Dutch Life Science GRID (DLSG) as basic hardware infrastructure, which consists of one large GRID at SARA and several small GRID points located at several national universities interconnected via fast SURFnet research network. The DLSG points that are not yet established (e.g., at Groningen University) will be designed in collaboration with NBIC to satisfy special needs for proteomics. DLSG will provide the required computational power and large amounts of storage capacity within the BIG GRID program. The NBIC BioAssist program will provide a generic e-science infrastructure (Virtual laboratory for e-science), within their proof-of-concept environment using the most recent developments in grid-computing data management and storage systems (e.g. SDSC Storage Resource Broker), Taverna workflow management software and visualization techniques, which also enable to share tools and data.

Committee of Experts

  1. Bas van Breukelen (UU)
  2. Péter Horvatovich (RUG)
  3. Twan America (WUR)
  4. User:Roeland van Ham (WUR)
  5. User:Perry Moerland (UvA)
  6. User:Jeroen Krijgsveld (UU)

Research structure

The committee invited several bioinformatics groups and experts to write initial research proposals based on their activities and expertise in bioinformatics for proteomics. The research program was defined based on the submitted proposals in order to increase synergy between the bioinformatics developments, to enhance collaborations between research groups, enable higher specialization of individual researchers in defined subjects, filling gaps in expertise and development needs, and to provide and develop generic tools enhancing data management. Based on these goals the committee defined 3 work packages (WPs), plus one lead scientist and several contributing scientists for each work package. All WPs include the work of the programmer positions assigned to the participating laboratories within the BioAssist program and 2 new additional positions per work package (3 years postdoctoral or scientific programmer positions) financed by NGI via NPC and NBIC within the present initiative.








Work Packages

Work Package 1 (WP1)

WP1 involves all developments related to information management and infrastructure. The lead scientist is Morris Swertz (RUG) and contributing scientists are User:George Byelas (RUG), User:Andrew Stubbs (EMC), and User:Henk van den Toorn (UU). This is an overarching/unifying WP that will use state-of-the-art web- and grid-based mechanisms to make the software, tools and functionalities developed in the other WPs accessible to the broadest audience of researchers in a robust and standardized way. Specifically, this WP has the following aims:

  • provide generic tools for data management, to define and use already existing proteomics data standards (e.g. mzData.xml, mzXML).
  • provide tools for organizing data processing and evaluation using the Taverna workflow management system.
  • provide generic tools for easier web services implementation.
  • development a structured approach towards the design and management of bioinformatics experiments and data structures.

This WP is based on the expertise of Morris Swertz in data and software structure management, using MOLGENIS for genomic data analysis, and on their collaboration with the Taverna development team aimed at incorporating data and software management into the Taverna workflow environment. The WP will benefit from the expertise on proteomics data standards and PRIDE proteomics database development of User:Henk van den Toorn and from the expertise in implementing biomarker informatics and decision support discovery platform of Andrew Stubbs. This WP will provide an implementation of use cases into the MOLGENIS/Taverna environment, based on already existing proteomics pipelines (e.g., LC-MS data processing pipelines developed in the groups of Morris Swertz (and User:Rainer Breitling) and Peter Horvatovich, and the integrated Proteomics Analysis Service intended to be developed by User:Andrew Stubbs).

Work Package 2 (WP2)

WP2 includes research concerning mass spectrometry based data processing. The lead researcher is Twan America (WUR) and the contributing researchers are Peter Horvatovich (RUG), Bas van Breukelen (UU) and User:Mandalina Drugan (UU). This work package will deal with algorithms and methods for processing raw mass spectrometric data and will include methods like peak detection methods, raw data filtering, alignment algorithms for multiple data types, protein and peptide quantification algorithms for label-free and stable isotope labeled LC-MS data, normalization and standardization techniques, and various evaluation tools to asses the quality of data processing modules. Enhancing protein identification and matching label-free LC-MS with MS/MS information will be also part of the WP2. This WP is based on the label-free data processing workflows and modules already under development in the groups of User:Rainer Breitling, Peter Horvatovich in collaboration with Frank Suits (IBM, USA) and Twan America, on LC-MS data processing frameworks for modularizing and interconnecting several freely available open source workflows (e.g. OpenMS, Superhirn etc.) currently under development by User:Rainer Breitling and Peter Horvatovich, and on de novo MS/MS protein identification expertise of User:Mandalina Drugan and Bas van Breukelen. Important novel developments of this WP will be new data processing modules with missing capabilities (e.g., retention time normalization algorithms using machine learning approaches), evaluation and assessment of data processing modules (e.g, accuracy of time alignment, normalization) and new de novo MS/MS protein identification algorithms using machine learning techniques. This WP will use the data and software management infrastructure developed in WP1 and will provide high quality annotated and integrated data for knowledge discovery taking place in WP3.

Work Package 3 (WP3)

WP3 contains all research related to biological knowledge extraction, functional annotation and classification. The lead scientist is User:Huub Hoefsloot (UvA) and contributing scientists are User:Andrew Stubbs (EMC), User:Theo Luider (EMC) and Antoine van Kampen (UvA). WP3 will deal with bioinformatics tools for evaluating processed mass spectrometric data for biological knowledge discovery and will contain different classification tools, e.g. for biomarker research, tools to visualize, interpret and analyze protein pathways, integration of proteomics data into protein-protein interaction networks and other types of biological networks, inference of new protein-protein interaction (sub)network, and new approaches for proteomics data annotation. The work of this WP is based on the expertise in statistical interpretation and evaluation of 'omics' data of the group of User:Huub Hoefsloot, in chemometrics and proteomics-related bioinformatics expertise of the group of Antoine van Kampen, in implementing biomarker informatics and decision support discovery platforms of User:Andrew Stubbs, and will benefit from the proteomics expertise of the group of User:Theo Luider. This WP will provide a Statistical Prediction Module containing already existing statistical methods and new proteomics related statistical developments (e.g. time series analysis), an integrated Proteomics Analysis Service task containing GUI's for a pathway knowledge management module, a functional analysis module for biological knowledge discovery and a pathway and network visualization part. WP3 will use tailor-made data and software management tools in the MOLGENIS/Taverna framework developed in WP1 and will extensively use processed and annotated data provided by WP2. Majority of specific tools developed will be open source, however the intellectual properties and commercialization right will belong to the institutions that have developed the tools. However the intellectual property of the framework of NBPP will belong to NGI.

Existing Platform tools

Tools integrated into the platform are listed in proteomics platform tools.