Difference between revisions of "Dutch Variant Database"

From BioAssist
Jump to: navigation, search
(Pros/Cons of Molgenis/LOVD/From-scratch)
(Pros/Cons of Molgenis/LOVD/From-scratch)
Line 113: Line 113:
 
| <div style="color:green;">
 
| <div style="color:green;">
 
* REST, SOAP, RDF
 
* REST, SOAP, RDF
* Tab-delimited file as input</div>
+
* Tab-delimited file as input
 +
* Tab-delimited and XLS export</div>
 
| <div style="color:green;">Any</div>
 
| <div style="color:green;">Any</div>
 
|-
 
|-

Revision as of 14:15, 7 April 2011

Dutch Variant Database (DVD)

Discussions on 2011-3-10

Participants

  • Ies Nijman (Hubrecht)
  • Marloes Hoogstraat (Hubrecht)
  • Robert Wagner (UMCG)
  • Christian Gilissen (UMCN)
  • Jane Hehir (UMCN)
  • Jeroen Laros (LUMC)
  • David van Enckevort (NBIC)
  • Leon Mei (NBIC)
  • Victor Guryev (Hubrecht)

Action Points

  • Ies, Christian, Jeroen, Robert will check with their groups on the data privacy policy and do an estimation on the data size.
  • Jeroen and Victor will summarize their discussion and produce 1st version of the data model. (Done)
  • Robert, David, Victor, Leon and Jeroen will discuss on the Molgenis and LOVD prototyping and come up with a list of pros and cons. (Done)

Background of local variant DB

UMCN

  • For exome sequencing
  • 3,000,000 variants stored
  • No cancer samples
  • Raw sequence data are included in this database
  • There are some QC scripts at MySQL level to detect some errors.

LUMC

  • For storing variants associated for specific genes
  • Support database fedration, thus can handle extremely large number of variants.
  • Automatic check on variant errors by a service called Mutalyzer

UMC

  • For exome sequencing and cancer sample sequencing
  • It has 84 samples now, with SNP and small indels detected by BWA.
  • users of our local DB often want to know more details than we expect. Thus, we are starting to store lots more info about variants

to support user's request.

  • Advanced search queries are supported: e.g. give me variants in a child but not presented in the parents.

UMCG

  • we use Molgenis because LOVD can not present sufficient phenotype info.
  • support embeded Genome browser.
  • Use the Gen2Phen data model for storing phenotype information.

Discussions on DVD

  • Shall we store only the summary of found variants from a study? Or do we store more details about each variant? I.e., would it be a project oriented or variant oriented DB? How much data do we expect?

AP: Ies, Christian, Jeroen, Robert will check with their groups on the data privacy policy and do an estimation on the data size.

  • How to support data privacy and security? (Jeroen)
  • We should support advanced queries like (1) variants that are only at exon or intron, and (2) give variants in child but not presented in

parents. (Christian, Ies)

  • We need to agree upon on an explicit genome build. hg19 has several

variaties. (Ies)

  • Which format should we use for describing variant? HGVS? We should

think carefully how to store the variant location for an efficient search. For example, by storing start pos, end pos, chromosome in different column.

  • How should we define the variant ID? a UUID? (Leon)
  • We should identify the real contact person for the variants (Jeroen).

Data model for DVD

  • AP: Jeroen and Victor will summarize their discussion and produce

1st version of the data model.

Prototyping in Molgenis

  • Robert implemented a prototype using Molgenis.
  • We should summarize a list of pros and cons on Molgenis prototype.

Prototyping in LOVD

  • Although we didn't go into details on this today. Leon thinks it is

still nice to explore a bit further on the LOVD possibility. AP: Leon and Jeroen will discuss on this and come up with a list of pros and cons on LOVD prototype.

Discussions on 2011-3-30

Participants

  • Jeroen Laros (LUMC)
  • Ivo Fokkema (LUMC)
  • David van Enckevort (NBIC)
  • Leon Mei (NBIC)

Minutes

  • LUMC propose only supporting two types queries. @Stakeholder meeting, we need to verify this with other groups and come up with a set of explicit queries we want to support in DVD.
    • whether a variant is common or not, with a predefined upper threshold?
    • who the submitter is for a particular variant?
  • LUMC propose a no-GUI system and only partners can access. Users can not browse the database and we have full control on the APIs and authentications. @Stakeholder meeting, we need to come up with explicit security requirements.
  • The updated data model can be found at: http://vm7.target.rug.nl/dvd/generated-doc/objectmodel.html
  • A running prototype using Molgenis based on this data model can be found at: http://vm7.target.rug.nl/dvd
  • Jeroen (probably with support from Ivo), Robert (probably with support from Morris) will make sure the DVD data model will be compatible with Gen2Phen and DbSNP model.

Questions for Stakeholder meeting

  • Do we need groups together users?
  • Do we need to support logging of the queries?
  • Do we need to support reference to/from the DVD for publications?
  • Do we need to support the possibility of exporting to DBSNP, etc

Pros/Cons of Molgenis/LOVD/From-scratch

Compared feature LOVD Molgenis From-scratch
API
  • v2: only Get
  • since v3: full REST
  • how to submit data? --Leon 11:00, 4 April 2011 (CEST)
  • REST, SOAP, RDF
  • Tab-delimited file as input
  • Tab-delimited and XLS export
Any
Required development effort
1~3 weeks
to check by David
>2 weeks (and times by Pi?)
Security
Support full AAA, however only via GUI
Support full AAA also on API
Any (e.g. Apache security)
Performance
single instance can store up to 1 million records. Support federated setup, so can be further scaled up.
serves millions of records without a problem
max
Storage engine MySQL MySQL (Java Persistence API in progress) Any
Extensibility (e.g. on data model)
Data model can be changed easily. Data migration is automatic (for large tables, manual intervention is required.).
Data model can be changed easily. Data migration is manual.
Manual
Wasted features
A lot!
Plugin based design, so less penalty here.
None

Stakeholder meeting on 2011-x-xx

Please indicate your availability via http://www.doodle.com/2pqyu3kecxdk8x7t