GSCFMinutes20091015

From BioAssist
Jump to: navigation, search

This webpage is no longer maintained and only here for archive purposes. Please refer to https://trac.nbic.nl/gscf and http://dbnp.org for up to date information on this project.


GSCF project start meeting

Meeting details

Date: Thursday October 15, 2009

Present: Tjeerd Abma, Adem Bilican, Kees van Bochove, Jildau Bouwman, Prasad Gajula, Robert Kerkhoven, Jeroen Wesbeek

Meeting goal

To start the collaboration on GSCF and decide on the technical and management formats to do so

Meeting summary

We discussed the collaboration at three levels: the content level (requirements / database design of both projects), the organizational level (work planning, management, iterations) and the technical level (programming language, integration with current code).

GSCF contents

We will build a web interface where a user can interactively specify study design. We will also start working on a query function (within GSCF) which is able to query this metadata. Clean data should be stored in specialized [e.g. metabolomics, transcriptomics, clinical chemistry] modules (see technical implementation). Whether the query function for clean data should also be put into GSCF is to be determined.

For data model, we will start with the current NMC schema. See GSCF Data Model. The dbNP programmers had a few questions about this schema:

  • What about integrating Controlled Vocabularies, for example to specify Species? (we will make this a feature request)
  • Isn't there a level missing between Source and Sample? You can take multiple samples from one plant or mouse, and you should store when you took which sample from which plant/mouse/human. Answer: this can be done in the current model, with Sample and Source. And maybe add a field 'time taken' to Sample?
  • To continue this, where do we store the knowledge that we have 20 'identical' mouse? Answer: make 20 records in the Source table, all having the same Species and Strain. Maybe there should be a level in between called 'Origin'.

We will start out the project with implementing the current NMC schema, then handle additional feature requests from there.

GSCF organization

This wiki and subversion on NBIC Gforge is our main collaboration tool. We will work via a time-bound iteration schedule: 3 weeks of programming followed by 1 week of testing. Features and tasks will be administered via Gforge as well (under Tracker and Tasks). Kees will be responsible for the iteration planning (Release Manager).

GSCF technical implementation

We will program in Grails. The database model will be written out in Grails domain classes, which Grails is able to persist (via Hibernate) in a database. We will have to deal with the fact that whenever you persist to a database and add some data, you will run into problems with the next change of the domain classes. So we have to write (and test!) SQL update scripts to update existing data to a new version with each release.

Furthermore, with regards to the specialized clean data modules as mentioned under contents, we will implement them as plugin modules for the General Study Capture Framework. We made this decision on basis of the following options and respective (dis)advantages:

  • Study capture tool as code sharing
    • + very flexible approach, everyone can do with it whatever he is requested to do
    • - it is very easy to branch / deviate from each other
  • Study capture tool as a general framework in which additional domain-specific assay data is stored with help of domain-specific plugins
    • + whenever one group adds a new platform (plugin), by design this can also immediately used by the other group
    • + we are forced to think very clearly about the relations between metadata and clean data
    • - it is a (big) extra effort to make requirements for a plugin interface

So we go for the second option. This implies that we have to think of a technical approach to implement this framework-plugin relation. Three options come to mind:

  • Grails code plugin which is directly integrated into the webinterface code (plugin specification like a Java interface class?)
    • + tight integration, which means that it will be easy to add features / slay bugs
    • + high speed
    • - tight integration, which makes it less sharable / universal
  • Integration via web services (plugin specification as WSDL)
    • + would make our tool easily extensible by the rest of the world
    • - when e.g. searching clean data, this solution will probably be too slow
  • Simple metadata sharing by generating ISATAB output from one system and importing it into the other
    • + you make use of an already-set standard
    • - no good possibility to access multiple types of clean data, which is the point of the whole project

Considering the huge amount of bio web services that are that and that we probably can link or even crawl, we can do this with both solutions. We can always write our own 'proxy' plugins that connect to other services.

The first option is probably the best when it comes to flexibility and speed. Because NMC already has code on what will be the clean metabolomics module, this is a good first test case for integration of the study capture framework and plugins.

To-do list

3 to 5 November, there is a Spring course on Grails in Amsterdam. The NMC team is planning to go, dbNP is interested as well.

The following tasks came up and will be put into the feature tracker / task list:

  • Create first version of data model as domain classes in Grails (Prasad, Tjeerd, Michael)
  • Write first draft plugin interface specification (Jeroen, Tjeerd, Kees)
  • Write first version of query interface and think about query requirements for plugin specification (Adem, Kees)