NGS Visualization Expertise
This page will describe the needs of visualization in NGS and our task force effort on that.
This page is currently under construction
NBIC activities on Genomic data visualization
Visualization needs for NGS
The amount of data generated in Next Generation Sequencing (NGS) experiments poses very specific problems for data visualization. In a typical next-generation sequencing experiment, many millions of reads are generated. These reads are aligned to a reference sequence by alignment software. This yields a dataset with different heterogeneousness properties per read, such as the bases of which the read consists, quality per base, the reference identifier and position at which the read starts. As this dataset contains many millions of reads, direct visualization of the data is impractical. To overcome this limitation, data transformations are necessary. For NGS data, several transformations are commonly used, such as summary statistics and coverage determination.
To visualize summary statistics for a typical NGS experiment, a researcher first needs to determine the summary statistics themselves. To this end several programs can be used, such as Bamtools  or Picard . Both of these tools require binary SAM files (BAM) as input. Both these tools are used on the command-line and thus are not as user-friendly as tools with a graphical user interface. In addition, the calculated statistics are sent to the command-line and thus a user needs to retrieve them and put them in a spreadsheet. Statistical tools, such as  or , can then be used to visualize the data. It is safe to say that these steps can not be performed by a novice user. Implementing these tools in GALAXY may substantially simplify these procedures .
Many genome browser have been developed over the last few years, but not all of them are able to visualize next-generation sequencing data. Many of these require specific input formats that have to be generated from your data. Most of these require the coverage to be calculated from your data. This valuee state the (relative) number times each base in the reference sequence was sequenced. To get this value the samtools software can be used (samtools pileup) .
UCSC and ENSEMBL genome browsers have worked
plot tracks on the genome
the concept of coverage
Standardized data formats
- Commercial software
The "pixel problem"
plot "reads" on a genomic backbone
- samtools view
- Commercial software
- R as a visualization platform