Data processing and analysis

From BioAssist
Jump to: navigation, search

The raw data is obtained by using the analytical tools as described in the previous section.

The next step is to preprocess the data to make sure that

i). all the necessary information is collected

ii). unnecessary redundant data is removed

various data analysis tools are used to process the raw data to produce the mass spectrum. That means, metabolite identification through data processing steps often involves sequential use of diverse array of analysis tools and data resources in order to answer the biological questions.

The processing can be to amplify the signals, reduce the noise, chromatographic alignment and/or peak detection. Infact there are two kinds of metabolomics studies. one of is targetted metabolomics studies, and the second one is untargeted metabolmics studies. In targeted metabolomics approach,a few metabolites that are of particular interest are analyzed. That means only certain regions of the chromatograms or certain m/z values are considered and analysed from the data. Whereas in the case of nontargeted metabolomics studies the entire chromoatogram is taken into consideration. That means all the peaks within the chromatogram are extracted. As mentioned before each peak correspond to a compound. However, it is quite often possible that, peaks are not made up of individual analytes but are mixtures of compounds. Spectra of commonly existing pure components can be obtained from overlapping peaks by applying spectral deconvolution methods.

Various deconvolution tools like AMDIS and LECO are available to produce the mass spectrum from the raw data in the case of single chromatogram. Whereas metabolite identification in the case of multiple chromatograms involves number of steps through number of tools like MetAlign/ XCMS, METOT etc. MetAlign tool is used for (a) base line correction, (b) noise detection and (c) ion-wise alignment. Metot tool is used for the pre-processing of MetAlign outputs by removing irrelevant noise information and formats data for the next step. MSClust is a intermediate data processing tool in the metabolomics data processing pipeline that works with the output generated by Metalign/Metot to remove the redundancy and for the tentative compound identification.

The general steps of data processing are:

i) Retrieval of peak information (height or area) that is characterized by mass and retention time. In case of GC-MS experiments: GC-MS provides information concerning fragments of the derivatised metabolite. Therefore it is important to identify the most reliable fragment on which to quantify the metabolite. It can be understood from the intensity values. the larger the fragment the larger the intensity it shows. However, retention time of peaks is also considered).

ii) Alignment by mass and retention time into a tabular matrix. where the columns represent GC-MS profiles, and rows represent masses within retention time window

(Retention times for the same analytes can drift between different runs. An important step in data preprocessing is therefore to match and align peaks that represent the same analyte from different samples )

iii) Reconstruction of mass spectra for peak identification

(After peak detection and alignment, some of the peaks will have none or only a few matches in other samples. Reasons for this are that a peak might not be present in a specific sample, peak detection may have failed because of noisy raw data, or that inaccurate parameter settings have been used for peak detection and chromatographic alignment)

iv) normalisation of tabular matrices by amount of sample and other parameters

(Normalization is used to remove unwanted systematic variation in the data. It can be introduced by a change in instrument response during the course of the analysis or can be caused by the fact that some samples are more dilute than others. The latter is quite commonly found in the analysis of urine samples. The concentrations of all metabolites in a sample can be normalized to one component (either an endogenous metabolite that is estimated to be present at a relatively constant level or a compound that has been spiked into the samples). Metabolite concentrations can also be normalized to the total concentration of all endogenous metabolites)

v) Transformation of data for non-biased statistical analysis. e.g., excel sheets

Tools for data processing and analysis





Go to NMC DSP data Processing Tool Chain