Biobanking

From BioAssist
Revision as of 11:29, 3 February 2010 by Mswertz (Talk | contribs)

Jump to: navigation, search

Biobanking is one of the platforms of BioAssist. 'Biobanks encompass of collections of basic materials (blood, DNA samples) and/or their descriptions (patients, diseases, phenotypes) and subsequent data (genotypes, microarray gene expressions). These are essential for the study of complex, multifactorial diseases where genetic, environment and lifestyle contribute together to the development of disease like cancer, diabetis, heart and inflammatory diseases. Current estimations state that the Dutch biobanks together contain materials and data on about 400.000 individuals, not counting PSI and LifeLines (ref: list of biobanks). To enable research, these materials and data need to be made findable, accessible and analyzable via suitable infrastructure.

NBIC has joined hands with the recently funded BBMRI-NL consortium to prioritize, coordinate and implement the bioinformatics resources needed. The remit of BBMRI is to establish European infrastructure for biobabanks. BBMR-NL is the dutch hub of this initiative encompassing the UMCs, String of Pearls, NKI, RIVM, University of Utrecht and the Free University Amsterdam (VU). The NBIC-BBMRI collaboration project is currently in the planning phase. This work is coordinated with strong national partner initiatives like String of Pearls (PSI) and LifeLines as well as infrastructure initiatives like SARA and TARGET and international initiatives such as P3G, Gen2Phen and BBMRI-EU. In this first phase the list of practical objectives is being discussed to produce suitable infrastructure for existing Dutch biobanks, such as:

* Improve findability of materials and data via a Dutch material and data catalogue listing all available biobanks and meta data on their contents. This is collaboration with the examplar BBMRI-EU and P3G catalogs.
* Improve accessibility of data via standardized data access interfaces that local biobanks can adopt to enable data sharing. This work will be strongly connected to, although not depend on, related efforts in European Bioinformatics Institute, GEN2PHEN, Parelsnoer, LifeLines, P3G, NPC, NMC, etc.
* Ease data exchange via harmonization of data formats for genetic and phenotypic data, as well as sample annotations to overcome barriers for scientific collaboration.  This is strong collaboration with international efforts such as FuGE, Gen2Phen, XGAP, MAGE-TAB, etc.
* Ease data integration via harmonization of data annotation mapping systems for phenotypic data and sample annotations. This in strong collaboration with Concept Web Alliance, P3G and various ontology projects.
* Establish a shared GWAS tool platform with services to QC, impute and analyze SNP array data against phenotypic information and data management capability to manage these kind of data.
* Establish a Dutch Genotype Archive for GWA Control Cohorts that associated Dutch biobanks can use as source for control sets for GWA studies.

For each of these themes ‘software-in-a-box’ implementations will be provided, next to public installations, that interested biobanks can just download and use locally. Also software will be developed as open source to ensure that the global community can join in via partner projects. Pilots of such projects are underway in collaboration with the European Bioinformatics Institute and Gen2Phen such as a phenotype database-in-a-box [1] and a microarray gene expression database-in-a-box [2] each using the novel software development strategy [3] implemented using the MOLGENIS system [4].

1. Pheno-OM – phenotype database and exchange format. http://wwwdev.ebi.ac.uk/microarray-srv/pheno/
2. MAGETAB-OM – microarray database using MAGE-TAB format http://wwwdev.ebi.ac.uk/microarray-srv/magetab 
3. Swertz MA, Jansen RC (2007) Beyond standardization: dynamic software infrastructures for systems genetics. Nature Reviews Genetics 8(3).
4. http://www.molgenis.org


Notes:

This page collects resources for the people working in this platform.

Genetic association analyses are often compromised by missing genotypic data. However several imputation methods are available to overcome this in some way. Below an overview of some of the imputation methods that are published.


Category Source Description Performance experience
Imputation MACH MACH 1.0 is a Markov Chain based haplotyper. It can resolve long haplotypes or infer missing genotypes in samples of unrelated individuals. (pubmed)
Imputation fastPHASE The program fastPHASE implements methods for estimating haplotypes and missing genotypes from population SNP genotype data. (pubmed)
Imputation IMPUTE IMPUTE is a program for imputing unobserved genotypes in genome-wide case-control studies based on a set of known haplotypes (like the HapMap Phase II haplotypes). (pubmed)
Imputation PLINK PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. (pubmed)
Imputation Beagle BEAGLE is a state of the art software package for analysis of large-scale genetic data sets with hundreds of thousands of markers genotyped on thousands of samples. (pubmed)