From BioAssist
Jump to: navigation, search

Managing life science information

PhD course Managing life science information

Target audience
Bioinformatics PhD students
Ammar Benabdelkader, Peter Boncz, Andrew Gibson, Frank van Harmelen, Iwan Herman, M. Scott Marshall, Barend Mons, Marco Roos, Morris Swertz
Guest lectures
Carole Goble, Katy Wolstencroft
M. Scott Marshall, Marco Roos
25-29 May 2009
Informatics Institute, F0.09, Science Park Amsterdam, the Netherlands
For participants without their own laptop with wifi we have limited hands-on facilities.
Lecture points
3EC (3 weeks)


Considering the complexity of biological systems it is not surprising that the management of life science information is one of the most challenging aspects of bioinformatics. For example, (medical) biologists have compiled over 17 million papers, and well over a thousand databases are known. However, a large number of information resources end up on an already formidable 'data graveyard'. Following this course can help you prevent your data or your information management system to have the same fate.

Target audience

If you would like to learn about how to perform powerful and flexible data and information management for your bioinformatics application, and how to work with data in distributed databases or through Web Services and workflows, or how a Web2.0 approach can help you reach out to users and leverage their contribution, then this course is for you. We assume a basic understanding of (relational) databases and programming.

Course description

This course introduces modern techniques for the management of life science data and knowledge for bioinformatics applications. Students will gain insight into Semantic Web languages and tools, federated databases, (Taverna) workflows, and Web2.0. After following this course students should be able to start creating their first applications based on these technologies or make more informed design decisions for their current application.



  • Day 1 and 2 - Knowledge-based information management
    • Where you will
      • learn about how the Semantic Web languages and tools can be used to manage biological data 'intelligently'
      • acquire some hands-on experience with these languages and tools
      • know what OWL and RDF mean and why they exist
      • learn about community-based science
  • Day 3 - Database workhorses
    • Where you will
      • learn about how to use relational databases for managing heterogeneous and distributed data
      • learn how laboratory information can be realistically managed, example: MolGenis
      • get hands-on experience with postgreSQL and MolGenis
  • Day 4 - Taverna and web services for collaborative data integration
    • Where you will
      • get a full tutorial on applying Taverna to implement data integration pipelines
      • get hands-on experience with Taverna
  • Day 5 - Hands-on Semantic Data integration
    • Where you will deploy what you have learned on your own application or on an example case, with experts present
  • Day 5 + two weeks - Practical assignment
    • Where you will work on a case to enjoy your new skills at information management (more details follow)
    • We will celebrate your skills during a mini-symposium and drinks at the very end of the course


Digital support

For collecting and sharing results of the hands-on sessions students are requested to sign up to and join the BioWiseInformationManagement2009 group. Students and lecturers are member of the Google group BioWiseInformationManagement2009 for sharing documents, in particular for the two-week assignment following the week of lectures. (Follow the links to sign up and view the groups.)

Recommended Software

We would appreciate it if you could bring your own laptop. The software that we will use in the course includes:

Protégé 4
Taverna1.7.1 NB Look out for Taverna 2.1beta on the same site; it is close to being released.
SWObjects enables federation of SPARQL (also to SQL)

Please consider installing these software tools. If applicable also bring an example of a database you are working on so you can use it in the hands-on sessions (e.g. wrap your MySQL database using MOLGENIS). Don't worry if you have any trouble installing; we will help especially with software that is critical for the course.

Example Data, Applications, and Lab Practicals

Practice with OWL (provided by Jochem Liem):

Protege Guide (.doc)
Ontology assignment(.doc)

e-DBI exercise (provided by Ammar Benabdelkader):

e-DBI Lab

Semantic Web Data Integration (UCSC ENCODE Application):

Original web page description of SWEDI
Web page accompanying journal article about SWEDI
Relevant Data for SWEDI

Practice with MOLGENIS (provided by Morris Swertz):

MOLGENIS practical guide (.pdf)
Code for Address Book exercises (optional, it is also in the pdf)
Code for Biomaterial exercises (optional, it is also in the pdf)

Practical assignment

On friday the 12th of June we will discuss potential solutions for information management needs of the five cases below. All participants and lecturers are invited to join the discussion. Each case is addressed by a team by writing a 1-3 page document. The document should satisfy a non-expert manager (e.g. Jeroen Schoemaker's manager at Friesland Foods), and a technical manager (e.g. Scott Marshall). Managers have minimal time, so the document should be minimal (bullet lists where possible). The team leader presents the team's case at the Friday meeting. In addition to the document, experience reports on try-outs of certain technologies for selected sub-tasks are highly appreciated.

Minimal topics for the document

  • Short problem statement
  • Incentive (motivation to address this problem)
  • Approach (theory behind)
    • Methods and technologies used (and how they address the problem)
  • Expected duration/cost (in time)
  • Prospective contributions of team members and contributions from external activities
  • Deliverables and milestones
  • Decisions managers will have to make
  • Coherence with the other case studies
  • Team member contributions to the document (include 'hands-on' try-outs of prototypes)

Note: proper execution of this assignment requires communication and sharing of documents within and between the teams. Please use any number of the sharing environments at your disposal (wiki, myExperiment, Google). We may ask about your choice and your experience. We will take into account that you have little time for the assignment.

Case studies

Genomics of Asthma and Allergy
Team:Jules Kerssemakers (team leader), Pascal Pfiffner, Ruud Schoemaker
Managing L1CAM mutation information for UMCG
Team:Bas Vroling (team leader), Vikram Mitra, Team Morris (support): Joeri van de Velde, Joris Lops
Protein-protein interaction motifs
Team:Erwin Datema (team leader), Julia Dimitrieva
Managing life science information of microscopy images
Team:Ernest van Ophuizen (team leader)
Gene-disease discovery
Team:Gerard Schaafsma (team leader), Herman van Haagen, Ronald van Eijk

Not committed to a study:

Phrasad Galuja
Wim Spee
Jan Bot

If you are a PhD student, please join one of the teams above. If you are not, please consider supporting one of the teams. Joining/supporting the smallest team will be highly appreciated.

Recommended Reading

The Semantic Web for the Working Ontologist Book
The Semantic Web Primer Book
A Journey to Semantic Web Query Federation in Life Sciences accepted, BMC Bioinformatics
Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges, by Lincoln Stein Nature Reviews
Automation of in-silico data analysis processes through workflow management systems, Paolo Romano Briefings in Bioinformatics
Beyond standardization: dynamic software infrastructures for systems biology, Morris Swertz and Ritsert Jansen Nature Reviews
Calling on a million minds for community annotation in WikiProteins, Barend Mons et al. BMC Bioinformatics
Pharmas Nudge Semantic Web Technology Toward Practical Drug Discovery Applications GenomeWeb article
Concept Web Alliance Hits Ground Running in Bid to Harness Semantic Web for Life Sciences GenomeWeb article
SPARQL by example by Lee Feigenbaum, Cambridge Semantics

Related Links

The W3C Semantic Web Health Care and Life Sciences Interest Group

Adaptive Information Disclosure (subproject of Virtual Lab for e-Science)

Adaptive Information Disclosure
AIDA Search demo films
AIDA Search demo

Browsing / Querying Bio Knowledge Bases

Linked Open Data faceted browser (with SPARQL generator)
Linked Life Data

Misc Useful Links

BioPortal (NCBO)
Shared Names

Commercial Applications



Please fill out this online survey for feedback for the coordinators and lecturers:

Please also fill out the official NBIC survey sent to you by post.

We thank you very much for your help.

More information and registration

Please visit or contact M. Scott Marshall or Marco Roos