Metabolomics Decision criteria

From BioAssist
Jump to: navigation, search

Choice of a database / datawarehouse environment will be based on requirements and criteria specified. The listing below is an inventory of these criteria. Ranking is also important, (in a second round), so please specify essentiality.

The requirements are split up into two categories. General requirements lists general functionality, while specific requirements might add particular data and metadata types, possibly file formats, etc..

General Requirements

  • It provides a platform for storing and making available data produced within NMC projects:
    • Project related data prior to data acquisition:
      • Biological question NMC project
      • Experimental design
      • Sampling information / subject information
    • Storage of raw data
    • Storage of intermediate processed data
    • Storage of final metabolite information
    • Storage of biostatistics results
  • It stores all the information needed to reproduce results
  • Has room for protocol parameters, such as:
    • Metaparameters of workflow should be transported to the datawarehouse together with processed data
    • Details of about columns, MS settings, etc
    • When workflow changes / additional functionality: storage of different intermediate datasets and different metaparameters
  • Has a webservice based interface
    • Allows automated storage and querying.
    • That uses controlled vocabularies (based on standards) to be able to retrieve data from multiple studies or only parts of a study dataset
    • Guarantees interaction with workflow environments and webservices
  • Available to all NMC members
    • source code
    • implementation
  • Provides authentication
    • at database/storage level to control direct user access
    • at workflow level to allow workflows to use user credentials to access the database/storage
  • Uses community-based standards for communication
  • Provides means to allow versioning and backing up of content of the database, to address accidental data corruption
  • Is flexible and scalable (see Extensibility)

Specific Requirements

  • Support for different levels of ‘granularity’ of metadata: e.g. some metadata are coupled to an experiment, while other metadata are coupled to a set of peaks
  • Compatibility with ArMet and MSI minimal reporting standards and standard ontologies (MeSH, etc)
  • Interactivity (example Frans): flagging of outliers, or other samples / peaks excluded from certain steps of processing (stored as metadata?)
  • Flexible search engine should be able to query metadata at all levels
  • Development time required to have a working version of the datawarehouse
  • Storage of various data types, including:
    • Retention time/index
    • mass spectral data
    • NMR data
    • Multistage mass spectral ( MSn - Spectral trees) data
    • Unique identifier (InChI) and concentration / quantity, as these two parameters allow for the next steps to biolgical interpretation..
  • Webinterface
  • 3 "orientations":
    • sample oriented
    • analysis or process-oriented: sample-track from storage to "clean data"
    • study oriented: from study design to sample via subjects and treatments. This part is not metabolomics specific but is essential for proper evaluation. NMC can align with a number of other initiatives in metadata capture like tab2mage, rsbi and isa-tab. The study oriented approach will allow for a modular integration with other study-related parameters (transcriptome, proteome, imaging, pheotypic parameters, genomic information, etc).