NGS Quality Control
This page summarizes common QC (Quality Control) steps used by the NGS taskforce member groups. Each step is labeled with a tag depends on its importance.
- Must is an essential step. You need it!
- Recommended is good to have.
- Advanced is only for picky and advanced user ;-).
- (Recommended) Check sample quality and concentration with lab-on-a-chip / agarose / qPCR
- (Advanced) DNA: check the fragmentation
- (Advanced) RNA/miRNA: before PAGE gel purification check the amplified cDNA
- (Must) Validate the size distribution of the library
- (Must) Use the sequencer specific monitoring program, e.g. Hiseq2000.
This section applies to datasets before and after data cleaning.
For FASTQ (from Illumina, IonTorrent, 454)
- (Must) Sequencer's specific QC reports, e.g. Illumina filter report which indicates how much reads are passed the initial filter and how many clusters are in the result. This report has some overlap with FastQC toolkit.
- (Must) Standard QC figures (use FastQC), e.g. coverage plots with GC%, frequency A, G, C, T (per position of read), k-mer content, etc.
- for human sample, G% and C% should be around 20%, T% and A% should be around 30%.
- (Recommended) Verify (a small selection of) reads are from the expected organism, e.g. use BLAST.
- This small selection should contain the same number of reads from each channel
- (Must) Verify sample's sex
- to detect potential sample swap
- Y only homozygous changes
- mtDNA only homozygous changes
For csFASTA (from SOLiD)
- (Recommended) Common QC steps for SOLiD data presented by Sander Boymans at NGS Meeting on 17-12-2010
@assembly (de novo assembly)
- (Must) Check number of contigs, length of assembly, N50, N95 indexes.
@alignment (reference-based assembly)
- (Must) Check the length of regions that have sufficient coverage together with its percentage of the total length of targeted region.
- (Must) if paired runs, check the distribution of insert size per sample.
- (Must) check the percentage of aligned reads per sample.
- (Must) check the overall quality of BAM files. Tools: http://qualimap.bioinfo.cipf.es
- (Recommended) check the percentage of uniquely aligned reads.
- (Recommended) check the distributed of mismatches in aligned reads.
- (Advanced) check the percentage of mitochondrial DNA (in ChIP)
- (Advanced) check the percentage of 21-22 mers for SAGE
- (Advanced) check the percentage of aligned reads in annotated transcripts (RNA-Seq)
- (Advanced) check the percentage of aligned reads in annotated miRNAs (miRNA profiling)
- (Must) For a population based study, check if the observed heterozygosity is close enough to the expected heterozygosity.
@variant detection and annotation
- (Recommended) dbSNP hit rate. (90% for the exome)
- (Recommended) number of novel SNPs.
- (Must) the Transition/Transversion Ratio of SNPs should be around 2. Tools: GATK VariantEval.
- (Must) if you have immunochip data for your samples, do a concordance check. Tools: GATK VariantEval.
- (Recommended) validation using chips or other sequencing platforms.
- (Recommended) Haplogroup matching if you have a population or trio based dataset. For example, check the haplogroup of your mtDNA data (http://www.phylotree.org/).
The following people have contributed to this page: