DbNPMinutes20091013

From BioAssist
Jump to: navigation, search

This webpage is no longer maintained and only here for archive purposes. Please refer to https://trac.nbic.nl/gscf and http://dbnp.org for up to date information on this project.


Project outline meeting

Meeting details

Date: Tuesday October 13, 2009

Present: Adem Bilican, Kees van Bochove, Jahn-Takeshi Saito, Martijn van Iersel

Meeting goal

To prepare for the upcoming Thursday meeting with NMC on a joint study capture tool

Meeting summary

Kees gave an update on the current situation and informed Martijn and Jahn about details of both the dbNP and the NMC projects. Martijn gave a lot of valuable ideas on how to set up the collaboration in an open source way, giving examples from the PathVisio project. We compared the ISATAB (from Roberts' ISAWEB) and NMC database schemes. We made up a first rough feature list (see dbNPFeatures) which can be used for a subsequent feature tracker / task list.

Comparison of ISATAB and NMC data schemes

Robert recreated the ISATAB structure in database form in the ISAWEB project. This is a valuable effort, since we plan to make an export option to ISATAB. Also, the NMC team already set up a database schema. If we are going to collaborate with them, we clearly will have to choose a way to go: either use the ISAWEB database structure, or use the NMC database structure, or make up something in between. We all agreed that since the NMC database structure is clearer, more concise and also better normalized, it would provide a good starting point. Also, there are some inherent problems with the way study design is described in ISATAB (e.g. the use of a cross-over design, where the same subjects are used in different factor combinations, in this example time and treatment, whereas ISATAB by design implies that every possible combination of factors give rise to different subjects/samples). Another advantage of this approach is that this way is easier to implement in the existing NMC code.

The ISAWEB structure that Robert made was generated using POJO's and Hibernate mappings. We will have to see when we have the NMC code how they came to the scheme (also Hibernate?). One further note about the practical value of an export feature to ISATAB: its use would at the moment be limited, so this will not be a top priority.

Regarding the NMC schema, we have some questions. One is whether they consider to use Controlled Vocabularies to choose for example the species in Source. Also, we are curious how the templating works (does this contain assay information? protocols? etc.). Furthermore, it is not clear how the data that is generated by the timeline tool can be persisted in the database. Finally, we should try to import some real data into the database. This could even be data that is scraped from public sources such as GEO, ArrayExpress/Atlas, USCS Genome Browser (public transcriptomics databases). The NuGO PPS data would also be a good candidate for this.

Project collaboration setup

Martijn gave a lot of input on this. In the end, we came up with the following approach, which we will propose to use at the meeting of next Thursday. Obviously, we will use a shared SVN repository for the code. On top of that, it would be good to have a bug tracker as well as a feature tracker. The feature tracker can be used to submit features, and also to plan iteration milestones (where a milestone is a subset of features that are to be reached for that milestone). We could use Trac for this, but probably also the Gforge tools on the NBIC site will do.

In parallel, we could use a Ubuntu-like iteration schedule. For PathVisio, it works well to use 3 weeks for development, then feature freeze the code and use 1 week for manual and unit testing. If there are database changes, we can use a tool to generate update SQL scripts to convert old versions of the database to the new one. Ideally, we also should write unit tests to make sure these scripts keep working. This especially becomes an issue when different versions of the database are running and users are depending on it.

We asked whether it would be a good idea to use just features to plan the iterations, but Martijn said that in practice, the time used for features often is difficult to estimate. The fixed time schedule gives some sort of rhythm to the project, which especially helps to make sure that there is enough time devoted to testing. The feature list can be used, however, as a priority list, that is why features should be put in the feature tracker, and priorities should be managed there.

To facilitate testing, it would be a good idea to set up a test server. Also, we could make up test protocols that can be followed to test the whole system, and these can be extended each iteration (put them on wiki). Testing then becomes something every person with spare time could do: just go to the program, perform the steps in the test protocol, and report any problems and/or bugs to the bugtracker. Of course, we also have to perform manual testing by the developers and end users (biologists). We could ask the biologists to input the studies they are using for their work just from the published papers, so that if we also have a query possibility, they would have immediate use out of it.

(See for example the WikiPathways test protocol Mvaniersel 15:20, 14 October 2009 (CEST))

We probably will need a release manager who announces releases, test periods and does the actual checkout of a release on the test server. Also, in this case he needs to inform the testers and keep track of their progress as well. Kees already indicated that he is willing to volunteer for this, but only if this is also supported by the NMC programmers.

(A good book that deals with the technical infrastructure for open source projects is Producing Open Source Software, readable online. Mvaniersel 15:20, 14 October 2009 (CEST))

Feature list

We tried to make a rough feature list, based on a proposal that I made earlier after a meeting with Jildau and Jeroen about the NMC study capture tool. A lot of the issues mentioned above came up doing that.

One important point is that we should try to find features that are really useful to biologists, so not only the possibility to input data, but also to get it out. So we should put effort in a solid query interface from the start. Jildau is maintaining a list of queries that biologists want to ask to dbNP. I will ask her to put them on the wiki. It might be a good idea if we tried to structure and prioritize those questions (simplest ones first), so that we can aim in our feature list / iteration planning at making it possible to answer these questions via the study capture query tool.

The feature list is in a separate topic, dbNPFeatures, to make it more accessible.

Target platform / language

Since we want to make a webinterface, we would need to use some kind of web enabled language. NMC is using Grails and we are fine with continuing with that. It is possible to use Java / Hibernate in Grails. That way, we also don't need to make an immediate choice for a specific database (Postgre/MySQL), since Hibernate can deal with most of these.