Metabolomics Data structures/entities
The identification of the main entities that store relevant information leads to a better understanding of the complexity of the data structure. Below two systems are presented to show what is involved, but the platform that BioAssist will have to support eventually is a superset of what is shown here.
System A
The main entity would be Project/Study Information Table. This table would hold, for example, information concerning the samples being measured and which apparatus is used (e.g. what settings are used). This table should also contain information about the statistical and mathematical processes (steps) that data at different levels are subjected to. This way it is always possible to reproduce results. A unique project identifier enables linkage of all study information.
In the picture below an example is shown in which integrated peak area's are stored. Specifics of the peaks are stored in the 'peak table'. Furthermore, the 'study information table' combines all information from the different tables. Using the class- and longitudinal information easy crossectioning of the data is possible that can be used at a later stages (data analysis)
Storing data in a similar manner (like the example above) allows for a flexible/scalable solution in which data from any kind of apparatus can be stored. The reproduction of the results can be done using the data steps listed in the 'processing table'. Object and/or peak removal can be viewed as a (pre) process step. The unique identifier throughout the study for a specific sample is crucial. Different levels of meta-information can be achieved by annotating data in different ways. The appropriate level for cross-study analysis has to be determined.
The proposed design should lead to:
- Multilevel(longitudinal) information extraction
- Cross-sectioning data per study
- A flexible/scalable solution
- Reproducable results
- Facilitation of interactive analysis
- Identification and annotation objects/peaks
- Easy access of data at any level of processing
- Cross-study analysis
System B
The following diagram shows the classes involved in a second system involving metabolomics data capturing, modeled after minimal reporting standards published in literature.
Reference documentation/standards
- Framework proposal for plant metabolomics experiments and their results
- Standardising the Reporting of Metabolite data
- Metabolomics Standards Initiative (MSI)
- Chemical Analysis Standards (PDF/MSI)
- ArMet
- Wiki overview of Minimum Information Standards
- Presentation about ArMet/MIAMET/SMRS
- MeMo database design
- MyBio wiki MyBio is the biologist's wiki workbench contains a list of various vendors of LIMS systems this could guide us in a positive sense
- CSISC Clinical Data Interchange Standards Consortium
- OpenClinica
- ISATAB metadata communication/exchange protocol
- MIBBI, Minimal Information for Biological and Biomedical Investigations
- Ontology for Biomedical Investigations
- Ontology for Clnical Investigations
- Clinical Terms list
- USDA National Nutrient Database for Standard Reference
- Nature, OBO Foundry ontologies (as of April 2007)
- SNOMED, International Health Terminology Standards Development Organisation
- PHIN (Public Health Information Network) Vocabulary Access and Distribution system (VADS), vocabulary repository
- National Center for Biomedical Ontology BioPortal
- Wiki page with ontology-related pages
- OBO Naming Conventions
- MSI Naming Conventions for Controlled Vocabularies and Ontologies
- Life Science IDentifier project
- Life Science IDentifier IBM tutorial
- International framework for Food Description
- CDISC Study Data Tabulation Model / Submission Data Domain Models
- EBI Ontology lookup service
- FDA Data Standards Council (DSC)
- ISO 15535:2003 General requirements for establishing anthropometric databases
- Unified Medical Language System (NIH)
- ICD-10, International Statistical Classification of Diseases and Related Health Problems
- Guidelines for the development of Controlled Vocabularies (PSIDEV)
- Reporting Structure for Biological Investigations Workgroup (RSBI)
- IUPAC nomenclature: Quantities, Units and Symbols in Physical Chemistry (Green book)
Design software
- Protege Ontology Editor
- SKOS-plugin for Protege
- Aqua Data Studio database IDE
- MySQL workbench
- fabFORCE DBDesigner
