NGS Tools

From BioAssist
Jump to: navigation, search

Here is some additional NGS related knowledge that are collected by our task force (in particular Dr. Marc van Driel). They are from mixed sources like pubmed, tool websites, seqanswers.com, etc.

Please feel free to add your tools and your experiences! You can make a new page about a tool if you want to describe it in some detail.


Category Package Description Performance experience
Assembly (de novo) ABySS "ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes." winner of 2010 Assemblathon
Assembly (de novo) ALLPATHS-LG Introduced on January 2011 by broad institute. "It works on both small and large (mammalian size) genomes. To use it, you should first generate ~100 base Illumina reads from two libraries: one from ~180 bp fragments, and one from ~3000 bp fragments, both at about 45x coverage. Sequence from longer fragments will enable longer-range continuity." winner of 2010 Assemblathon
Next-Gen Large File Viewer BAMseek Provides easy-to-use user interface to browse files too large to be opened with conventional text editors, such as TextEdit or Word. Supports SAM/BAM, VCF (BGZF-compressed and uncompressed), and FASTQ formats. Available on all platforms with the Java Runtime Environment. None
Viewer EagleView genome viewer EagleView is an information-rich genome assembler viewer with data integration capability. EagleView can display a dozen different types of information including base qualities, machine specific trace signals, and genome feature annotations. None
Alignment MUMmerGPU MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. None
Methylation Batman Bayesian tool for methylation analysis (Batman)—for analyzing methylated DNA immunoprecipitation (MeDIP) profiles None
Base-calling Alta-Cyclic Alta-Cyclic is a novel Illumina Genome-Analyzer (Solexa) base caller. Alta Cyclic Features: Longer Reads, More Accurate Reads (compared to Solexa's default base caller), Reduces systematic bias towrsd a certain nucleotide in later cycles. On a GAII platform, Alta Cyclic was able to provide a large amount of useful reads after 78 cycles. None
Enrichment/peak calling FindPeaks 3.1 Findpeaks was developed to perform analysis of ChIP-Seq experiments. It uses a naive algorithm for identifying regions of high coverage, which represent Chromatin Immunoprecipitation enrichment of sequence fragments, indicating the location of a bound protein of interest. None
Assembly (de novo) ALLPATHS De novo assembly of whole-genome shotgun microreads. None
Assembly (de novo) SHARCGS SHARCGS is a suitable tool for fully exploiting novel sequencing technologies by assembling sequence contigs de novo with high confidence and by outperforming existing assembly algorithms in terms of speed and accuracy. Authors are Dohm JC, Lottaz C, Borodina T and Himmelbauer H. from the Max-Planck-Institute for Molecular Genetics. None
Assembly (de novo) Velvet Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454. Need about 20-25X coverage and paired reads. Developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI). None
Assembly (de novo) EDENA De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Made by Hernandez D et al. None
Assembly SSAKE The Short Sequence Assembly by K-mer search and 3' read Extension (SSAKE) is a genomics application for aggressively assembling millions of short nucleotide sequences by progressively searching for perfect 3'-most k-mers using a DNA prefix tree. SSAKE is designed to help leverage the information from short sequences reads by stringently clustering them into contigs that can be used to characterize novel sequencing targets. Authors are René Warren, Granger Sutton, Steven Jones and Robert Holt from the Canada's Michael Smith Genome Sciences Centre. Perl/Linux. None
Alignment qpalma QPalma is an alignment tool targeted to align spliced reads produced by Next Generation sequencing platforms such as Illumina Solexa or 454. QPalma aligns short reads to the genomic sequences in an optimal way according to its underlying algorithm and trained parameters. It creates an alignment using dynamic programming (written in C++), and returns the alignment in a psl like format. The algorithms computes optimal local alignments, so if no alignment has been found it is because no alignment got a sufficiently high alignment score. None
Alignment SOAP SOAP (Short Oligonucleotide Alignment Program) is a program for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Author is Ruiqiang Li at the Beijing Genomics Institute. C++ for Unix. None
Assembly (de novo) SOAPdenovo SOAPdenovo, a short read de novo assembly tool, is a package for assembling short oligonucleotide into contigs and scaffolds. winner of 2010 Assemblathon
SNP/Indel Discovery SOAPsnp SOAPsnp is an accurate consensus sequence builder based on soap1 and SOAPaligner/soap2's alignment output. It calculates a quality score for each consensus base, which can be used for any latter process to call SNPs. None
Workflows Galaxy Galaxy allows you to do analyses you cannot do anywhere else without the need to install or download anything. You can analyze multiple alignemnts, compare genomic annotations, profile metagenomic samples and much much more... None
Communities SeqAnswers Next generation sequencing community. None
Integrated solutions CLCbio Genomics Workbench de novo and reference assembly of Sanger, 454, Solexa, Helicos, and SOLiD data. Commercial next-gen-seq software that extends the CLCbio Main Workbench software. Includes SNP detection, browser and other features. Runs on Windows, Mac OS X and Linux. None
Integrated solutions NextGENe de novo and reference assembly of Illumina and SOLiD data. Uses a novel Condensation Assembly Tool approach where reads are joined via "anchors" into mini-contigs before assembly. Requires Win or MacOS. None
Integrated solutions SeqMan Genome Analyser Software for Next Generation sequence assembly of Illumina, 454 Life Sciences and Sanger data integrating with Lasergene Sequence Analysis software for additional analysis and visualization capabilities. Can use a hybrid templated/de novo approach. Early release commercial software. Compatible with Windows® XP X64 and Mac OS X 10.4. None
Alignment ELAND Efficient Large-Scale Alignment of Nucleotide Databases. Whole genome alignments to a reference genome. Written by Illumina author Anthony J. Cox for the Solexa 1G machine. None
Assembly EULER Short read assembly. By Mark J. Chaisson and Pavel A. Pevzner from UCSD (published in Genome Research). None
Alignment Exonerate Various forms of alignment (including Smith-Waterman-Gotoh) of DNA/protein against a reference. Authors are Guy St C Slater and Ewan Birney from EMBL. C for POSIX. None
Alignment & Mapping GMAP GMAP (Genomic Mapping and Alignment Program) for mRNA and EST Sequences. Developed by Thomas Wu and Colin Watanabe at Genentec. C/Perl for Unix. None
Alignment & Assembly MOSAIK Reference guided aligner/assembler. Written by Michael Strömberg at Boston College. None
Alignment & Mapping MAQ Mapping and Assembly with Qualities (renamed from MAPASS2). Particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has preliminary functions to handle ABI SOLiD data. Written by Heng Li from the Sanger Centre. None
Alignment MUMmer MUMmer is a modular system for the rapid whole genome alignment of finished or draft sequence. Released as a package providing an efficient suffix tree library, seed-and-extend alignment, SNP detection, repeat detection, and visualization tools. Version 3.0 was developed by Stefan Kurtz, Adam Phillippy, Arthur L Delcher, Michael Smoot, Martin Shumway, Corina Antonescu and Steven L Salzberg - most of whom are at The Institute for Genomic Research in Maryland, USA. POSIX OS required. None
Alignment Novocraft Tools for reference alignment of paired-end and single-end Illumina reads. Uses a Needleman-Wunsch algorithm. Available free for evaluation, educational use and for use on open not-for-profit projects. Requires Linux or Mac OS X. None
Assembly RMAP Assembles 20 - 64 bp Solexa reads to a FASTA reference genome. By Andrew D. Smith and Zhenyu Xuan at CSHL. (published in BMC Bioinformatics). POSIX OS required. None
Alignment SeqMap Works like ELand, can do 3 or more bp mismatches and also INDELs. Written by Hui Jiang from the Wong lab at Stanford. Builds available for most OS's. None
Assembly SHRiMP Assembles to a reference sequence. Developed with Applied Biosystem's colourspace genomic representation in mind. Authors are Michael Brudno and Stephen Rumble at the University of Toronto. Works with data in letterspace (Roche, Illumina), colourspace (AB) and Helicos space. probcalc gave a 'segmentation fault' on a x86_64-pc-linux-gnu with Roche data.
Alignment SSAHA SSAHA (Sequence Search and Alignment by Hashing Algorithm) is a tool for rapidly finding near exact matches in DNA or protein databases using a hash table. Developed at the Sanger Centre by Zemin Ning, Anthony Cox and James Mullikin. C++ for Linux/Alpha. None
Alignment SXOligoSearch SXOligoSearch is a commercial platform offered by the Malaysian based Synamatix. Will align Illumina reads against a range of Refseq RNA or NCBI genome builds for a number of organisms. Web Portal. OS independent. None
Assembly (de novo) MIRA2 MIRA (Mimicking Intelligent Read Assembly) is able to perform true hybrid de-novo assemblies using reads gathered through 454 sequencing technology (GS20 or GS FLX). Compatible with 454, Solexa and Sanger data. Linux OS required. None
Assembly (de novo) VCAKE De novo assembly of short reads with robust error correction. An improvement on early versions of SSAKE. None
SNP/Indel Discovery ssahaSNP ssahaSNP is a polymorphism detection tool. It detects homozygous SNPs and indels by aligning shotgun reads to the finished genome sequence. Highly repetitive elements are filtered out by ignoring those kmer words with high occurrence numbers. More tuned for ABI Sanger reads. Developers are Adam Spargo and Zemin Ning from the Sanger Centre. Compaq Alpha, Linux-64, Linux-32, Solaris and Mac None
SNP/Indel Discovery PolyBayesShort A re-incarnation of the PolyBayes SNP discovery tool developed by Gabor Marth at Washington University. This version is specifically optimized for the analysis of large numbers (millions) of high-throughput next-generation sequencer reads, aligned to whole chromosomes of model organism or mammalian genomes. Developers at Boston College. Linux-64 and Linux-32. None
SNP/Indel Discovery PyroBayes PyroBayes is a novel base caller for pyrosequences from the 454 Life Sciences sequencing machines. It was designed to assign more accurate base quality estimates to the 454 pyrosequences. Developers at Boston College. None
Integrated solutions STADEN Includes GAP4. GAP5 once completed will handle next-gen sequencing data. A partially implemented test version is available here None
Viewer XMatchView A visual tool for analyzing cross_match alignments. Developed by Rene Warren and Steven Jones at Canada's Michael Smith Genome Sciences Centre. Python/Win or Linux. None
Integrated solutions SAM Sequence Assembly Manager. Whole Genome Assembly (WGA) Management and Visualization Tool. It provides a generic platform for manipulating, analyzing and viewing WGA data, regardless of input type. Developers are Rene Warren, Yaron Butterfield, Asim Siddiqui and Steven Jones at Canada's Michael Smith Genome Sciences Centre. MySQL backend and Perl-CGI web-based frontend/Linux. None
Enrichment/peak calling CHiPSeq From Science Johnson, 2007 None
RNAseq ERANGE ERANGE is a Python package for doing RNA-seq and ChIP-seq (hence the "dual-use"), and is a descendant of the ChIPSeq mini peak finder (Johnson, 2007). In particular, the RNAseq analysis uses some of the very same code to access Cistematic. Version 2.0 is the first released in the wild and is "Bed"-centric. In particular, it is not optimized for speed! None
Methylation BS-Seq The source code and data for the "Shotgun Bisulphite Sequencing of the Arabidopsis Genome Reveals DNA Methylation Patterning" Nature paper by Cokus et al. (Steve Jacobsen's lab at UCLA). POSIX. None
Mapping gnumap he Genomic Next-generation Universal MAPper (gnumap) is a program designed to accurately map sequence data obtained from next-generation sequencing machines (specifically that of Solexa/Illumina) back to a genome of any size. Currently, gnumap is designed to be used with the _int.txt data received from the Solexa/Illumina machine. None
Mapping ZOOM ZOOM (Zillions Of Oligos Mapped) is designed to map millions of short reads, emerged by next-generation sequencing technology, back to the reference genomes, and carry out post-analysis. ZOOM is developed to be highly accurate, flexible, and user-friendly with speed being a critical priority. None
Assembly & Chromosome walking Tracembler Tracembler streamlines the process of recursive database searches, sequence assembly, and gene identification in resulting contigs in attempts to identify homologous loci of genes of interest in species with emerging whole genome shotgun reads. A web server hosting Tracembler is provided at http://www.plantgdb.org/tool/tracembler/, and the software is also freely available from the authors for local installations. None
Enrichment/peak calling sissrs Produce a list of peakmaxima from aligned positions. None
Assembly SHRAP The source code will be made available individually upon request. "However, note that we do not have a tool that can be used on real 454 sequence data in a production setting." None
Alignment Phred Phrap Consed Cross_match The phred software reads DNA sequencing trace files, calls bases, and assigns a quality value to each called base. Phrap is a program for assembling shotgun DNA sequence data. Cross_match is a general purpose utility for comparing any two DNA sequence sets using a 'banded' version of swat. Consed/Autofinish is a tool for viewing, editing, and finishing sequence assemblies created with phrap. None
Chromatine Profiling ChromaSig An unsupervised learning method, which finds, in an unbiased fashion, commonly occurring chromatin signatures in both tiling microarray and sequencing data. None
Integrated solutions Shore Analysis suite for Illumina short read data. None
Mapping GenomeMapper Short read mapping tool. None
Base-calling & Analysis Rolexa Allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots. None
Integrated solutions SolexaTools SolexaTools is a project to create a tool set to work with a Solexa genome sequencer. It includes multiple components including a LIMS system, pipeline and other tools to support end-users and researchers setting up a Solexa environment. None
ChIPseq QuEST QuEST is a Kernel Density Estimator-based package for analysis of massively parallel sequencing data from chromatin immunoprecipitations (ChIP-Seq or ChIPseq). None
Mapping SOCS SOCS is a program designed for efficient mapping of ABI SOLiD sequence data (Short Oligonucleotides in Color Space) to a reference genome with concurrent sequence census and SNP discovery functions. None
Alignment BOWTIE Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of 25 million reads per hour on a typical workstation with 2 gigabytes of memory. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: 1.3 GB for the human genome. It supports alignment policies equivalent to Maq and SOAP but is much faster: about 35x faster than Maq and over 350x faster than SOAP when aligning to the human genome. None
Analysis CARPET Collection of Automated Routine Programs for Easy Tiling) is a set of Perl, Python and R scripts, integrated on the Galaxy2 web-based platform, for the analysis of ChIP-chip and expression tiling data, both for standard and custom chip designs. None
Assembly CABOG Celera Assembler is scientific software for DNA research. CA is a 'whole genome shotgun sequence assembler' -- it reconstructs long sequences of genomic DNA given the fragmentary data produced by whole-genome shotgun sequencing. Celera Assembler was modified for combinations of ABI 3730 and 454 FLX reads. The revised pipeline called CABOG (Celera Assembler with the Best Overlap Graph) is robust to homopolymer run length uncertainty, high read coverage, and heterogeneous read lengths (pubmed). Less and larger contigs with Roche FLX reads compared to Newbler. It gives problems with Roche Titanium now-and-then.
ChIPseq / ChIP-chip PASS "..Motivated by the Poisson clumping heuristic, we propose an accurate and efficient method for evaluating statistical significance in genome-wide ChIP-chip tiling arrays. The method works accurately for any large number of multiple comparisons, and the computational cost for evaluating p-values does not increase with the total number of tests..." pubmed None
ChIPseq / ChIP-chip CATCH CATCH is an tool for exploring patterns in ChIP profiling data. The CATCH algorithm performs a hierachical clustering of the profile patterns with an exhaustive alignment at each step. The algorithm has a user-friendly graphical interface that makes it easy for you to browse your results. None
Misc DNAzip A series of techniques that in combination reduces a single genome to a size small enough to be sent as an email attachment. pubmed None
Integrated solutions CisGenome An integrated tool for tiling array, ChIP-seq, genome and cis-regulatory element analysis None
Tiling-array analysis TiMat2 TiMAT2 contains tools for low and high level genomic tiling microarray analysis using the Affymetrix, NimbleGen, and Agilent platforms. It is designed for processing single and multi chip data sets from ChIP-Chip, RNA difference, and aCGH experiments. None
microRNA CleaveLand A pipeline for using degradome data to find cleaved small RNA targets. None
Alignment PECAN "..method of probabilistic consistency alignment and make it practical for the alignment of large genomic sequences. In so doing we develop a set of new technical methods, combined in a framework we term 'sequence progressive alignment', because it allows us to iteratively compute an alignment by passing over the input sequences from left to right. The result is that we massively decrease the memory consumption of the program relative to a naive implementation. The general engineering of the challenges faced in scaling such a computationally intensive process offer valuable lessons for planning related large-scale sequence analysis algorithms. We also further show the strong performance of Pecan using an extended analysis of ancient repeat alignments. Pecan is now one of the default alignment programs that has and is being used by a number of whole genome comparative genomic projects." pubmed None
Assembly SHORTY "..Our assembler SHORTY is targetted for de novo assembly of microreads with mate pair information and sequencing errors. SHORTY has some novel approach and features in addressing the short read assembly problem.." None
Assembly ABySS "ABySS is a de novo sequence assembler that is designed for very short reads. The single-processor version is useful for assembling genomes up to 40-50 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes." None
Alignment PASS "PASS performs fast gapped and ungapped alignments of short DNA sequences onto a reference DNA, typically a genomic sequence. It is designed to handle a huge amount of reads such as those generated by Solexa, SOLiD or 454 technologies. The algorithm is based on a data structure that holds in RAM the index of the genomic positions of "seed" words (typically 11-12 bases) as well as an index of the precomputed scores of short words (typically 7-8 bases) aligned against each other." pubmed None
ChIPSeq NPS "..Our method provides an effective framework for studying nucleosome positioning and epigenetic marks in mammalian genomes..." pubmed None
Assembly Consensus SeqCons is an open source consensus computation program for Linux and Windows. The algorithm can be used for de novo and reference-guided sequence assembly. None
ChIPseq MACS Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer. MACS empirically models the shift size of ChIP-Seq tags, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions. None
Assembly Scheibye-Alsing et al A comprehensive overview of the current publicly available sequence assembly programs. pubmed n.a.
Transcript seq FrameDP Sensitive peptide detection on noisy matured sequences. A self-training integrative pipeline for predicting CDS in transcripts which can adapt itself to different levels of sequence qualities. None
Enrichment/Peakcalling PeakSeq PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. A methodology for identifying punctate binding sites in ChIP-Seq experiments based on their characteristics. publication None
Mapping SOCS SOCS is a program designed for efficient mapping of ABI SOLiD sequence data (Short Oligonucleotides in Color Space) to a reference genome with concurrent sequence census and mismatch identification functions. None
Mapping CloudBurst CloudBurst is a new parallel read-mapping algorithm optimized for mapping next-generation sequence data to the human genome and other reference genomes, for use in a variety of biological analyses including SNP discovery, genotyping, and personal genomics. It is modeled after the short read mapping program RMAP, and reports either all alignments or the unambiguous best alignment for each read with any number of mismatches or differences. This level of sensitivity could be prohibitively time consuming, but CloudBurst uses the open-source Hadoop implementation of MapReduce to parallelize execution using multiple compute nodes. pubmed None
Alignment VAAL VAAL is a variant ascertainment algorithm that can be used to detect SNPs, indels, and more complex genetic variants. On bacterial data sets, it achievies very high sensitivity, and near perfect specificity. VAAL can be used to compare reads from one strain to a reference sequence from another strain. It can also be used to compare reads from two strains to each other, using a third strain to determine homology. For example, we have used VAAL to find a single mutation responsible for bacterial resistance: the output of the program was that single mutation and no others. VAAL uses an assisted assembly algorithm that borrows from ALLPATHS. pubmed None
Assembly AMOS The AMOS consortium is committed to the development of open-source whole genome assembly software. The project acronym (AMOS) represents our primary goal -- to produce A Modular, Open-Source whole genome assembler. None
ChIPseq ChIPmeta In this work, hierarchical hidden Markov model (HHMM) is proposed for combining data from ChIP-seq and ChIP-chip. In HHMM, inference results from individual HMMs in ChIP-seq and ChIP-chip experiments are summarized by a higher level HMM. Simulation studies show the advantage of HHMM when data from both technologies co-exist. Analysis of two well-studied transcription factors, NRSF and CTCF, also suggests that HHMM yields improved TFBS identification in comparison to analyses using individual data sources or a simple merger of the two. pubmed None
CNV CNVseq CNV-seq, a method to detect copy number variation using high-throughput sequencing. pubmed None
SNP/indel VarScan VarScan, an open source tool for variant detection that is compatible with several short read align-ers. pubmed None
SNP/indel VARid A Hidden Markov Model for representing both color-space and letter-space reads together, and a framework for determining variation without diret translation of those reads (ISMB 2009). None
Indel Pindel A pattern growth approach to detect break points of large deletions and medium sized insertions from pairedend short reads. None
RNAseq TopHat TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons. None
Assembly LOCAS LOCAS low-coverage short-read assembler None
Misc SOLID software tools SOLID software tools hosted by Applied Biosystems None
Metagenome MEGAN Metagenome Analysis Software - MEGAN (“MEtaGenome ANalyzer”) is a new computer program that allows laptop analysis of large metagenomic datasets. In a preprocessing step, the set of DNA reads (or contigs) is compared against databases of known sequences using BLAST or another comparison tool. MEGAN can then be used to compute and interactively explore the taxonomical content of the dataset, employing the NCBI taxonomy to summarize and order the results. None
Viewer MapView Visualization of short reads alignment on desktop computer None
Viewer HawkEye An Interactive Visual Analytics Tool for Genome Assemblies. None
Viewer Haplowser Haplowser: comparative haplotype browser for personal genome and metagenome pubmed. None
Analysis Swift Primary Data Analysis for the Illumina Solexa Sequencing Platform. pubmed None
Browser SUDS genome browser Compressed suffix tree implementation to browser genome sequences None
Analysis SHREC A new algorithm for correcting errors in short-read data that uses a generalized suffix trie on the read data as the underlying data structure pubmed None
Analysis SICER A clustering approach for identification of enriched domains from histone modification ChIP-Seq data. pubmed None
Analysis HI Haplotype Improver:A program to improve haplotype reconstruction by incorporating information from paired-end reads, and demonstrate its utility on simulated data. pubmed None
Misc SAM tools The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128Mbp) produced by different sequencing platforms...SAMtools implements various utilities for postprocessing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. pubmed None
InDel MoDIL MoDIL: Detecting INDEL Variation with Clone-end Sequencing None
Assembly SR-ASM SR-ASM algorithm is designed for DNA assembly of the short sequences coming from 454 sequencers. None
Alignment/SNP Discovery Crossbow Crossbow is a cloud-computing software tool that combines the aligner Bowtie and the SNP caller SOAPsnp. Executing in parallel using Hadoop, Crossbow analyzes data comprising 38-fold coverage of the human genome in three hours using a 320-CPU cluster rented from a cloud computing service for about $85. An article about this software can be found at here None
Analysis BEDtools A software suite for the comparison, manipulation, and annotation of genomic features in BED and GFF format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features.pubmed None
Analysis XplorSeq ToolKit XSTK is a suite of C and C++ libraries and command-line programs for DNA sequence analysis:

barcrawl -- Design of barcoded oligonucleotides for multiplex sequencing.

bartab -- Annotation, binning and dereplication of barcoded DNA sequences.

biodiv -- Calculation of common ecological biodiversity indices.

sortx -- OTU clustering of aligned sequences.pubmed

None
Analysis RAMMCAP The metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP) was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP.pubmed None
Pipeline SeqWare SeqWare currently provides four tools specifically designed to support massively parallel sequencing technologies (Illumina, ABI SOLiD, 454). The first is a LIMS web application (SeqWare LIMS) to manage samples, record computational events, and present computational results back to end users. The second component is a pipeline (SeqWare Pipeline) which consists of many different programs useful for processing and annotating sequence data. These can be combined with other tools (BFAST, BWA, SAMtools, etc) and strung together to form more complex workflows to support many experiment types. Third, a query tool (SeqWare Query Engine) is available to database and query variants and other events inferred from sequence data. Finally, SeqWare MetaDB provides a common database to store metadata used by all components. All four tools can be used together or separately. Document is not complete and installation is difficult.
Alignment / SNP/Indel Discovery BOAT Basic Oligonucleotide Alignment Tool (BOAT) can accurately and efficiently map sequencing reads back to the reference genome. BOAT can handle several substitutions and indels simultaneously, a useful feature for identifying SNPs and other genomic structural variations in functional genomic studies. For better handling of low-quality reads, BOAT supports a "3'-end Trimming Mode" to build local optimized alignment for sequencing reads, further improving sensitivity. BOAT calculates an E-value for each hit as a quality assessment and provides customizable post-mapping filters for further mapping quality control. pubmed None
Alignment mrsFAST, mrFAST mrsFAST and mrFAST are are designed to map short reads generated with the Illumina platform to reference genome assemblies; in a fast and memory-efficient manner. mrsFAST and mrFAST are cache-oblivous short read mappers that optimize cache usage to get higher performance. pubmed None
Analysis/Annotation GAMES GAMES (Genomic Analysis of Mutations Extracted by Sequencing) is a tool for mining and prediction of functional effect of mutation. It performs SNP/SNV/InDel calling (for different platforms and different sequencing experiments (single-read/PE, whole-exome/whole-genome/Sure Select)) and annotates each mismatch. It detects those mutations with clinical significance. It integrates a lot of databases, and can predict the disease-causing potential of a mutation by interface to Mutation Taster. pubmed None
Database/Pipeline Management Molgenis The MOLGENIS platform allows you to automatically generate rich database software to your specifications, including web user interfaces to manage and query your data, various database back ends to store your data, and programmatic interfaces to the R language and web services. You tell MOLGENIS what to generate using an data model and user interface model described in XML; at the push of a button MOLGENIS translates this model into SQL, Java and R program files. Also documentation is generated. While the standard generated MOLGENIS is sufficient for most data management needs, MOLGENIS also allows you to plug in handwritten software components that build on the auto-generated software platform. None
Analysis snpEff It's a variant effect predictor tool, it predicts the effect of genetic variations (SNPs, insertions, deletions and MNPs). SnpEff is really fast
Alignment PASS http://www.ncbi.nlm.nih.gov/pubmed/19218350
Storage CRAM toolkit The CRAM toolkit is a Java program for writing and reading CRAM files. The CRAM format, based on the research paper Efficient storage of high throughput DNA sequencing data using reference-based compression, will be described in detail in a separate specification. Briefly, the CRAM format is based on efficient compression of DNA sequences by storing only differences between aligned and reference sequences with provision for reads that do not map to the reference. The CRAM format is expected to dramatically lower the storage cost for archiving re-sequencing reads.[1] None

See Also