DbNP Clean Transcriptomics Database
The Clean Transcriptomics Database is a module of the Nutritional Phenotype Database (dbNP). The development of this module is funded by NuGO and built by Wageningen University & Research Centre. It aims at a centralized storage of transcriptomics data, that is queryable via the dbNP query module, and has its accompanying study metadata stored in the dbNP study capture module. It complies with the 'dbNP omics submodule' standard by serving clean data via the dbNP clean data layer.
The module is built by Robert Kerkhoven and Philip de Groot from WUR. Test versions are deployed onto the WUR NBX. The ultimate goal is to deploy the database onto all NBXes, so that each NuGO member organization can store its own transcriptomics assay data on its NBX. The delivery of the database is planned for the first week of March, 2010.
The clean transcriptomics database has two goals:
- The implementation of a uniform normalization for (initially) Affymetrix microarray data, which results in dbNP 'clean data'
- The implementation of the clean data layer of dbNP, to integrate the transcriptomics data with study capture and other omics data
The realization of these goals is described in the next two sections.
Providing clean data
Normalization is implemented by the GenePattern package 'NuGOMakeCleanData', which is installed on the GenePattern instances on the NuGO NBXes. It is the responsibility of the scientist to convert their raw data (CEL files) into normalized data (GCT file + CHIP file, the so called CustomCDF format). The package automatically downloads the latest annotations for the Affymetrix arrays. For documentation, see the following PDF file.
Integration with other dbNP modules
In order to serve the clean data, a web application is built that enables the user to upload their normalized transcriptomics expression data (GCT file and CHIP file). This data is stored in a local MySQL database. After that step, the web application enables the user to link the arrays in the data to assays and samples in the dbNP study capture module. Finally, the application implements the clean data layer of the dbNP query module to be able to serve the clean data for dbNP queries.
The clean data elements that the module ultimately should serve are:
'Biomarkers' with as output a table with gene name, P-value and T-value:
- Differential biomarkers
* Differential gene expression of one or more, or all genes between two groups of samples
- Paired biomarkers
* Differential gene expression of one ore more, or all genes between two paired groups of samples
'Biomarkers' with as output a table with gene name and an expression value (number) for each sample:
- Quantitative biomarkers
* Absolute (normalized) expression of one or more, or all genes for a group of samples
Current development status
- The GenePattern module for normalization has been developed by Philip and has been installed on the WUR NBX. It will be deployed to the other NBXes soon.
- The web application for upload and storage of the clean transcriptomics data has been developed by Robert. It can be found at http://nbx13.nugo.org/ctd.
- The linking of assay and sample IDs and the implementation of the biomarker layer is still to do. This work is dependent on the exact specification of both the assay-sample-interaction with the dbNP study capture module and the biomarker layer of the dbNP query module, which is under development.