BioAssist structure

From BioAssist
Jump to: navigation, search

Mission statement

The BioAssist programme focuses on the use e-Bioscience approaches to provide flexible and sustainable bioinformatics support: “generic solutions to generic problems raised by Life Scientists”. As such it is a joint effort between the life science-, bioinformatics-, and informatics-communities. It is also an important dissemination mechanism for methods and databases developed in the BioRange research programme.

The BioAssist e-bioscience support model is a collaborative and multi-disciplinary effort to implement e-bioscience support platforms for life sciences. Selected tools & databases are integrated using a grid-based research infrastructure. BioAssist scientific programmers will implement the platforms to make progress in (the NGI-supported) labs. Progress both in terms of working faster, but also in terms of moving in these labs the boundaries of science itself.

The basic philosophy of the BioAssist support platforms is that tools and databases are not provided as a (separate) “central service” but remain at the local computer systems of contributors. Contributors remain responsible for updates and maintance of these services which ensures that the platforms can adequately and quickly adapt to changing needs in life sciences domains and that there is transparent access to bioinformatics tools and corresponding services (e.g., storage) by end-users.

BioAssist platforms are:

1. Integrated analysis of functional genomics data
2. Proteomics data management and analyses
3. Metabolomics data management and analyses
4. Biobanking
5. High-throughput sequencing
6. Systems bioinformatics

Brief description E-science

E-bioscience is the application of e-science approaches in bio-sciences and refers to scientific projects that are carried out in multi-disciplinary, distributed collaborations using generic technologies from the informatics and ICT fields. E-bioscience objectives include the seamless incorporation and integration in biosciences (e.g., for bioinformatics or systems biology support) of data producing research facilities (e.g. mass spectrometers), biological and medical data collections, compute power, and large data storage, as well as facilities for scientific visualization and data analysis. Sharing and re-using facilities is the essential component of e-bioscience. Web and GRID technology enable e-bioscience by providing approaches for integration and transparently integrate distributed services and facilities.

“Code of conduct” BioAssist support platform

To be able to make an impact in the field NBIC insists on the following code of conduct for each of the contributors to the Bioassist support platforms:

  • Make bioinformatics tools, databases, and expertise available to participants of a platform and eventually to the whole (inter)national community involved in the supported domain.
  • Use and define standards to safeguard compatibility with international developments in genomics and (bio)informatics domain (e.g., ontologies, web-services and workflow management).
  • Enlarge consensus and coherence in the field with respect to the use and selection of computational/statistical models and data resources.
  • Provide access to research facilities and resources (incl. computing power, storage facilities, computational models, and databases).
  • Provide visualization tools when needed.

Roadmap

Each BioAssist platform has an Annual plan, which is approved by the BioAssist programme committee. These annual plans contain the platform deliverables, but should not considered to be restrictive for further developments. However, key developments like for instance technology choices and 'long-term'/'heavy investment' developments should be reported to the BioAssist programme committee (see communication for the appropriate procedure).

Annual plans for:

1. Integrated analysis of functional genomics data
2. Proteomics data management and analyses
3. Metabolomics data management and analyses
4. Biobanking
5. High-throughput sequencing
6. Systems bioinformatics

Technical framework

Workflow management system

Within BioAssist local tools and databases are provided as webservices that are integrated with Taverna. Taverna is a central workflow management system to integrate webservices. The Taverna project aims to provide a language and software tools to facilitate easy use of workflow and distributed compute technology within the e-Science community. Its workbench is a rich graphical user interface that is able to build and execute workflows of the components of many kinds.

More info on Taverna (Taverna tutorial).

Webservices

Currently, Taverna supports BioMoby or SOAP based WSDLs (e.g. Soaplab services) as web-services. Since the BioAssist support platforms is a collaborative and multi-disciplinary effort, it is essential that users and other programmers are able to find and use the web-services for the available tools and databases within the support platform. Hence, every WS requires detailed annotation and documentation. The web-services can be published on the NBIC website (contact Marc van Driel) and should work within Taverna (version 2.x). BioAssist workflows are published via MyExperiment using the bioassist_nl group.

Life science grid

NBIC is setting up a Life Science Grid for Life Scientists and bioinformaticians since grid computing can speed up computation of the massive amounts of biological data dramatically. Hereto, small computer clusters are placed at different institutions throughout the Netherlands. These clusters are connected to each other by a high speed network, provided by [SURFNET] and can be used in parallel by the use of Grid middleware. All BioAssist participants can freely access this infrastructure.

SARA places the clusters, installs the software and middleware and maintains the systems. In addition, the e-science support group of SARA provides (potential) users with background information, advice and tutorials about using the Life Science Grid. The same group also provides custom made programming effort in order to help scientist to make their software ready for Grid. The use of workflow systems, web services and knowledge integration and management is also among the expertise of the group.

See here for an introduction how to use the Life Science Grid. For more information contact Machiel Jansen.

Repository

In BioAssist, GForge is used as a software repository system. In this repository you can develop your programs and collaborate with other BioAssist programmers. The default versioning system is Subversion for new projects, but it is possible to use CVS for existing projects. GForge is the BioAssist system to:

  • Manage File Releases
  • Document Management
  • News announcements
  • Surveys for users and admins
  • Issue tracking with "unlimited" numbers of categories, text fields, etc
  • Task management

Open Source

Code produced in the BioAssist programme

Documentation

Software development in BioAssist is a collaborative effort. Documentation, both on the code as well as on the user level are essential to make this collaboration work. The GForge repository is the central place to add this documentation.

Licenses/IP

NBIC advises to use the Apache License (http://www.opensource.org/licenses/apache2.0.php). Why? The apache license allows collaboration over (national) borders. This lack of usage restrictions stimulates external contributions, exposure, and drives to generate momentum for the several projects. Furthermore, the apache licenses allows commercial spin-offs.

NBIC does not hold IP on the various projects and IP belongs to the individual parties. We strive and advise to be open to collaboration and sharing of resources.

Use of commercial software

In general, NBIC is reluctant to the use of commercial software. NBIC only supports software that has an open architecture (e.g., access through APIs, web-services bypassing the interactive user interface, which would allow interfacing to other software). Also the possibility to extend the software with e.g., plugins would be considered to be a pre. The software company is expected to provide the necessary information and interfaces to use their software in this "open" manner. New use of commercial software should be evaluated in the BioAssist programme committee (see communication/procedures).

Oracle

NBIC acquired a campus license of Oracle, including training and support, available for participants within the NBIC network. Oracle can be used in BioAssist for existing Oracle depending software. Access to Oracle technology can be acquired by requesting an NBIC Oracle Project (contact Marc van Driel). Access will be granted to the Oracle Metalink site for updates patches and a large Knowledgebase.[1]

New BioAssist software should use whenever possible open-source databases like PostgreSQL and MySQL.

Matlab

The Matlab compiler is available on the Life Science Grid and can be used there. Matlab core is installed at SARA with a limited amount of licenses. These can be used for development purposes. However, we advise to use open-source alternatives to implement the algorithms developed in Matlab, e.g. R.

Rosetta Resolver

There is a license for Rosetta Resolver. However, this will be discontinued in the near future and no new software should be developed that depends on Rosetta Resolver

Spotfire DecisionSite

The Spotfire decisionsite software is used by ~50 users (april 2008). Spotfire supports the use of external sources and programs, but new software should not soley be developed for use within spotfire.

Communication

NBIC

The NBIC BioAssist Programme committee meets 1x per 2 months. The BioAssist core group meets every month.

Scientific programmers community

The group of Scientific programmers meets every month. During these meetings selected programmers give a short overview of their current work. The rest of the day is used for training purposes and collaborative coding.

Support platforms meetings

The platform specific meetings are organised by the platform leaders. We stimulate collaborative efforts between the various platforms as well as the core technology groups (e.g. escience group at SARA). If you plan a platform meeting and need input form other parties, please invite them or ask NBIC for advice.

Wiki/mailing list

  • BioAssist wiki (wiki.nbic.nl) - NBIC installed a wiki to share knowledge among the people with BioAssist, but also collaborative groups. We stimulate to share practical (working) procedures and expertise on various software packages, etc...
  • Gforge software repository (gforge.nbic.nl) - see Repository
  • Mailing list (bioassist-users@nbic.nl) - For all questions and announcements.
  • myExperiment (BioAssist_NL) - Sharing services and workflows within BioAssist
  • NBIC website (www.nbic.nl) - List of applications/webservices and general communication.

Education

Basic courses for the scientific programmers are offered during the monthly meetings. If there is a need to organise other courses, or courses together with collaborative groups, please don't hesitate to contact us. A number of courses outside the scientific programmers group are already planned:

  1. Grid tutorial for bioinformaticians.
  2. E-science for managers
  3. Webservices for BioAssist programmers
  4. Java for BioAssist programmers

Procedures

Selection BioAssist program targets

Each BioAssist platform formulates an annual plan, which also contains

  1. A description of the technology that will be used.
  2. A list of programs/algorithms/etc targets at which the scientific programmers will focus.

The BioAssist programme committee will evaluate the plans and will give approval or ask for more detailed plans/modifications. Collaboration is one of the major goals of the BioAssist programme and the committee will aim to maximize synergy between the platforms and third parties.

NBIC support

NBIC offers support on different topics and various levels of detail. For this we have a number of channels you can use to get to a solution for your problem:

  • Wiki (wiki.nbic.nl) - knowledge platform to find and exchange information on BioAssist topics.
  • Mailing list (bioassist-users@nbic.nl) - Ask your questions on technical and user level topics, and discuss issues.
  • E-science - Specific advice to your problem using a e-science approach (see below)
  • Taverna - first line support is provided by NBIC via the wiki/bioassist-user mailing list. For additional support see below taverna
  • Personal support - none of the above or more detail support/advice (ask contact via Marc van Driel/Machiel Jansen)

Support for Life Science Grid

SARA/NBIC e-science support provides support for using the Grid infrastructure. Support ranges from access to the Life Science Grid, Grid use and programming, building Web Services that utilize Grid resources, workflow, mass storage, Grid metadata systems etc.

  • Users can use the bioassist-users@nbic.nl mailing list
  • SARA can be asked for individual support

Additional Taverna support

All in depth questions on Taverna can be asked via: