Dutch Variant Database

From BioAssist
Jump to: navigation, search

Dutch Variant Database (DVD)

More documents, codes can be found at https://trac.nbic.nl/dvd/

Discussions on 2011-3-10

Participants

  • Ies Nijman (Hubrecht)
  • Marloes Hoogstraat (Hubrecht)
  • Robert Wagner (UMCG)
  • Christian Gilissen (UMCN)
  • Jane Hehir (UMCN)
  • Jeroen Laros (LUMC)
  • David van Enckevort (NBIC)
  • Leon Mei (NBIC)
  • Victor Guryev (Hubrecht)

Action Points

  • Ies, Christian, Jeroen, Robert will check with their groups on the data privacy policy and do an estimation on the data size.
  • Jeroen and Victor will summarize their discussion and produce 1st version of the data model. (Done)
  • Robert, David, Victor, Leon and Jeroen will discuss on the Molgenis and LOVD prototyping and come up with a list of pros and cons. (Done)

Background of local variant DB

UMCN

  • For exome sequencing
  • 3,000,000 variants stored
  • No cancer samples
  • Raw sequence data are included in this database
  • There are some QC scripts at MySQL level to detect some errors.

LUMC

  • For storing variants associated for specific genes
  • Support database fedration, thus can handle extremely large number of variants.
  • Automatic check on variant errors by a service called Mutalyzer

UMC

  • For exome sequencing and cancer sample sequencing
  • It has 84 samples now, with SNP and small indels detected by BWA.
  • users of our local DB often want to know more details than we expect. Thus, we are starting to store lots more info about variants

to support user's request.

  • Advanced search queries are supported: e.g. give me variants in a child but not presented in the parents.

UMCG

  • we use Molgenis because LOVD can not present sufficient phenotype info.
  • support embeded Genome browser.
  • Use the Gen2Phen data model for storing phenotype information.

Discussions on DVD

  • Shall we store only the summary of found variants from a study? Or do we store more details about each variant? I.e., would it be a project oriented or variant oriented DB? How much data do we expect?

AP: Ies, Christian, Jeroen, Robert will check with their groups on the data privacy policy and do an estimation on the data size.

  • How to support data privacy and security? (Jeroen)
  • We should support advanced queries like (1) variants that are only at exon or intron, and (2) give variants in child but not presented in

parents. (Christian, Ies)

  • We need to agree upon on an explicit genome build. hg19 has several

variaties. (Ies)

  • Which format should we use for describing variant? HGVS? We should

think carefully how to store the variant location for an efficient search. For example, by storing start pos, end pos, chromosome in different column.

  • How should we define the variant ID? a UUID? (Leon)
  • We should identify the real contact person for the variants (Jeroen).

Data model for DVD

  • AP: Jeroen and Victor will summarize their discussion and produce

1st version of the data model.

Prototyping in Molgenis

  • Robert implemented a prototype using Molgenis.
  • We should summarize a list of pros and cons on Molgenis prototype.

Prototyping in LOVD

  • Although we didn't go into details on this today. Leon thinks it is still nice to explore a bit further on the LOVD possibility. AP: Leon

and Jeroen will discuss on this and come up with a list of pros and cons on LOVD prototype.

Discussions on 2011-3-30

Participants

  • Jeroen Laros (LUMC)
  • Ivo Fokkema (LUMC)
  • David van Enckevort (NBIC)
  • Leon Mei (NBIC)

Minutes

  • LUMC propose only supporting two types queries. @Stakeholder meeting, we need to verify this with other groups and come up with a set of explicit queries we want to support in DVD.
    • whether a variant is common or not, with a predefined upper threshold?
    • who the submitter is for a particular variant?
  • LUMC propose a no-GUI system and only partners can access. Users can not browse the database and we have full control on the APIs and authentications. @Stakeholder meeting, we need to come up with explicit security requirements.
  • The updated data model can be found at: http://vm7.target.rug.nl/dvd/generated-doc/objectmodel.html
  • A running prototype using Molgenis based on this data model can be found at: http://vm7.target.rug.nl/dvd
  • Jeroen (probably with support from Ivo), Robert (probably with support from Morris) will make sure the DVD data model will be compatible with Gen2Phen and DbSNP model.

Questions for Stakeholder meeting

  • Do we need groups together users?
  • Do we need to support logging of the queries?
  • Do we need to support reference to/from the DVD for publications?
  • Do we need to support the possibility of exporting to DBSNP, etc

Pros/Cons of Molgenis/LOVD/From-scratch

Compared feature LOVD Molgenis From-scratch
API
  • v2: only Get
  • since v3: full REST
  • how to submit data? --Leon 11:00, 4 April 2011 (CEST)
  • REST, SOAP, RDF
  • Tab-delimited file as input
  • Tab-delimited and XLS export
Any
Required development effort
1~3 weeks
1 week
>2 weeks (and times by Pi?)
Security
Support full AAA, however only via GUI
Support AAA also on API, logging needs some TLC
Any (e.g. Apache security)
Performance
single instance can store up to 1 million records. Support federated setup, so can be further scaled up.
serves millions of records without a problem
max
Storage engine MySQL MySQL (Java Persistence API in progress) Any
Extensibility (e.g. on data model)
Data model can be changed easily. Data migration is automatic (for large tables, manual intervention is required.).
Data model can be changed easily. Data migration is manual.
Manual
Wasted features
A lot!
Plugin based design, so less penalty here.
None

Stakeholder meeting on 2011-9-26

This is a meeting with various stakeholders to define concrete further steps by examining the DVD prototype.

Location

SURFoundation, office 5a

Hojel City Center, gebouw D (5th floor)

Graadt van Roggenweg 340

3531 AH Utrecht

Agenda

Time slot Responsible Subject
14:00-14:10 Leon Mei Welcome
14:10-14:50 Jeroen Laros DVD introduction and demo
14:50-15:05 Christian Gilissen DVD potential applications
15:05-15:15 All Coffee break
15:15-16:00 Jeroen Laros/Leon Mei Discussion and planning
  • Supported queries (is retrieving data now a too complicated procedure? possible improvement?)
  • Additional applications of DVD (e.g. checking for presence of deleterious mutations)
  • Policy
  • Hosting

Participants

  • Ies Nijman (Hubrecht/UMCU)
  • Jeroen Laros (LUMC)
  • Johan den Dunnen (LUMC)
  • Christian Gilissen (UMCN)
  • Joep Ligt (UMCN)
  • Morris Swertz (UMCG)
  • Robert Wagner (UMCG)
  • Slavik Koval (EMC)
  • Wilfred van Ijcken (EMC)
  • Rutger Brouwer (EMC)
  • Aldo Jongejan (AMC)
  • Fred van Ruissen (AMC)
  • Najim Ameziane (VUMC)
  • David van Enckevort (NBIC)
  • Leon Mei (NBIC)
  • Tom Visser (SARA)