NGS Alignment Expertise

From BioAssist
Jump to: navigation, search

This page summarizes the alignment software used in next generation sequencing. More importantly, we also indicate who within NBIC has been using that so you could contact them if you run into troubles on a particular tool.

Category Package Technology Description Performance experience NBIC user
Alignment MUMmerGPU MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. None
Alignment mrFAST and mrsFAST map short reads generated from Illumina in a fast and memory-efficient manner. [article][download] None
Alignment qpalma QPalma is an alignment tool targeted to align spliced reads produced by Next Generation sequencing platforms such as Illumina Solexa or 454. QPalma aligns short reads to the genomic sequences in an optimal way according to its underlying algorithm and trained parameters. It creates an alignment using dynamic programming (written in C++), and returns the alignment in a psl like format. The algorithms computes optimal local alignments, so if no alignment has been found it is because no alignment got a sufficiently high alignment score. None
Alignment SOAP Hash-based SOAP (Short Oligonucleotide Alignment Program) is a program for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Author is Ruiqiang Li at the Beijing Genomics Institute. C++ for Unix. None
Alignment SOAP2 BWT-based SOAP2 is a significantly improved version of the short oligonucleotide alignment program that both reduces computer memory usage and increases alignment speed at an unprecedented rate. We used a Burrows Wheeler Transformation (BWT) compression index to substitute the seed strategy for indexing the reference sequence in the main memory. We tested it on the whole human genome and found that this new algorithm reduced memory usage from 14.7 to 5.4 GB and improved alignment speed by 20-30 times. SOAP2 is compatible with both single- and paired-end reads. Additionally, this tool now supports multiple text and compressed file formats. A consensus builder has also been developed for consensus assembly and SNP detection from alignment of short reads on a reference genome. None Frans
Alignment ELAND (Illumina) Hash-based Efficient Large-Scale Alignment of Nucleotide Databases. Whole genome alignments to a reference genome. Written by Illumina author Anthony J. Cox for the Solexa 1G machine. None Joris, Morris, Stephan
Alignment Exonerate Various forms of alignment (including Smith-Waterman-Gotoh) of DNA/protein against a reference. Authors are Guy St C Slater and Ewan Birney from EMBL. C for POSIX. None
Alignment MUMmer MUMmer is a modular system for the rapid whole genome alignment of finished or draft sequence. Released as a package providing an efficient suffix tree library, seed-and-extend alignment, SNP detection, repeat detection, and visualization tools. Version 3.0 was developed by Stefan Kurtz, Adam Phillippy, Arthur L Delcher, Michael Smoot, Martin Shumway, Corina Antonescu and Steven L Salzberg - most of whom are at The Institute for Genomic Research in Maryland, USA. POSIX OS required. None
Alignment Novocraft Tools for reference alignment of paired-end and single-end Illumina reads. Uses a Needleman-Wunsch algorithm. Available free for evaluation, educational use and for use on open not-for-profit projects. Requires Linux or Mac OS X. None
Alignment SeqMap Works like ELand, can do 3 or more bp mismatches and also INDELs. Written by Hui Jiang from the Wong lab at Stanford. Builds available for most OS\'s. None
Alignment SSAHA SSAHA (Sequence Search and Alignment by Hashing Algorithm) is a tool for rapidly finding near exact matches in DNA or protein databases using a hash table. Developed at the Sanger Centre by Zemin Ning, Anthony Cox and James Mullikin. C++ for Linux/Alpha. None
Alignment SXOligoSearch SXOligoSearch is a commercial platform offered by the Malaysian based Synamatix. Will align Illumina reads against a range of Refseq RNA or NCBI genome builds for a number of organisms. Web Portal. OS independent. None
Alignment Phred Phrap Consed Cross_match The phred software reads DNA sequencing trace files, calls bases, and assigns a quality value to each called base. Phrap is a program for assembling shotgun DNA sequence data. Cross_match is a general purpose utility for comparing any two DNA sequence sets using a \'banded\' version of swat. Consed/Autofinish is a tool for viewing, editing, and finishing sequence assemblies created with phrap. None
Alignment Bowtie Burrows-Wheeler Transform (BWT) Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of 25 million reads per hour on a typical workstation with 2 gigabytes of memory. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: 1.3 GB for the human genome. It supports alignment policies equivalent to Maq and SOAP but is much faster: about 35x faster than Maq and over 350x faster than SOAP when aligning to the human genome. None Jeroen, Morris, Rutger, Frans
Alignment PECAN \"..method of probabilistic consistency alignment and make it practical for the alignment of large genomic sequences. In so doing we develop a set of new technical methods, combined in a framework we term \'sequence progressive alignment\', because it allows us to iteratively compute an alignment by passing over the input sequences from left to right. The result is that we massively decrease the memory consumption of the program relative to a naive implementation. The general engineering of the challenges faced in scaling such a computationally intensive process offer valuable lessons for planning related large-scale sequence analysis algorithms. We also further show the strong performance of Pecan using an extended analysis of ancient repeat alignments. Pecan is now one of the default alignment programs that has and is being used by a number of whole genome comparative genomic projects.\" pubmed None
Alignment PASS \"PASS performs fast gapped and ungapped alignments of short DNA sequences onto a reference DNA, typically a genomic sequence. It is designed to handle a huge amount of reads such as those generated by Solexa, SOLiD or 454 technologies. The algorithm is based on a data structure that holds in RAM the index of the genomic positions of \"seed\" words (typically 11-12 bases) as well as an index of the precomputed scores of short words (typically 7-8 bases) aligned against each other.\" pubmed None
Alignment VAAL VAAL is a variant ascertainment algorithm that can be used to detect SNPs, indels, and more complex genetic variants. On bacterial data sets, it achievies very high sensitivity, and near perfect specificity. VAAL can be used to compare reads from one strain to a reference sequence from another strain. It can also be used to compare reads from two strains to each other, using a third strain to determine homology. For example, we have used VAAL to find a single mutation responsible for bacterial resistance: the output of the program was that single mutation and no others. VAAL uses an assisted assembly algorithm that borrows from ALLPATHS. pubmed None
Alignment & Assembly MOSAIK Hash-based Reference guided aligner/assembler. Written by Michael Strömberg at Boston College.
  • more an assembly tool than alignment
Alignment & Mapping GMAP GMAP (Genomic Mapping and Alignment Program) for mRNA and EST Sequences. Developed by Thomas Wu and Colin Watanabe at Genentec. C/Perl for Unix. None
Alignment & Mapping MAQ Hash-based Mapping and Assembly with Qualities (renamed from MAPASS2). Particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has preliminary functions to handle ABI SOLiD data. Written by Heng Li from the Sanger Centre.
  • really old, replaced by BWA
Jurgen, Victor, Frans
Alignment/SNP Discovery Crossbow Crossbow is a cloud-computing software tool that combines the aligner Bowtie and the SNP caller SOAPsnp. Executing in parallel using Hadoop, Crossbow analyzes data comprising 38-fold coverage of the human genome in three hours using a 320-CPU cluster rented from a cloud computing service for about $85. An article about this software is published in Genome Biology None
Alignment BWA Burrows-Wheeler Transform (BWT) A fast light-weighted tool that aligns relatively short sequences (queries) to a sequence database (targe), such as the human reference genome. It implements two different algorithms, both based on Burrows-Wheeler Transform (BWT). The first algorithm is designed for short queries up to ~200bp with low error rate (<3%). It does gapped global alignment w.r.t. queries, supports paired-end reads, and is one of the fastest short read alignment algorithms to date while also visiting suboptimal hits. The second algorithm, BWA-SW, is designed for long reads with more errors. It performs heuristic Smith-Waterman-like alignment to find high-scoring local hits (and thus chimera). On low-error short queries, BWA-SW is slower and less accurate than the first algorithm, but on long queries, it is better.
  • biggest user community
  • fast
  • good support from developers
  • manual is very brief
Frans, Jeroen
Alignment SHRiMP2 Hash-based Indexes the reads instead of the reference genome. Uses masks to generate possible keys. Can map ABI SOLiD color space reads.
  • true color space alignment
Alignment Stampy BWT-based None Jeroen
Alignment & SNP/SV Discovery LifeScope
  • only for color space (developed specifically only for 5500)
  • cluster/server based system (need at least 4 nodes to run)
  • fast
  • user friendly
  • commercial
  • manual is intense