Assembly metrics explained

From BioAssist
Jump to: navigation, search

Return to the main page of Next Generation Sequencing

Explanation of assembly metrics

  • Runtime (in minutess)
The time is takes for an assembly to run
  • Max memory
The maximum amount of RAM that the assembly needs to run
  • Sequence length
The total length in bases of the assembled data
  • Largest contig
The length in bases of the largest contig
  • Number of contigs larger than 100
The number of contigs , longer than 100 bases
  • N50 length
Ordering all contigs from longest to shortest until the summed length exceeds 50% of the total length of all sequences. The length of the latest added contig is caled the N50 length. In general, the larger the N50 length, the higher the quality of the assembled genome.
  • N50 index
Ordering all contigs from longest to shortest until the summed length exceeds 50% of the total length of all sequences. The index (count) of the latest added contig is the N50 index. In general, the smaller the N50 index, the higher the quality of the assembled genome.
  • Number of mismatches to reference
These are the number of GSNPs and Gindels that dnadiff from MUMmer 3.22 reports at standard settings
  • Percentage of N's in consensus
What percentage of the bases is unknown
  • Percentage of unused reads in assembly
How many reads are unused in the assembly


To add

  • N90 size and index
  • Mean contig length
  • Standard deviation of contig length
  • Median contig length
  • Number of contigs >=1kb
  • Number of bases in contigs >=1kb
  • GC Content of contigs
  • Input genome size
  • All metrics, also for scaffolds
  • How many N's to split a scaffold into contigs (also generate histogram of N stretch length)
  • Percentage of assembly in scaffolded contigs
  • Number of contigs per scaffold (mean, median,stdev)
  • Graph of genome size growth, with markers for N50 & N90
Like this:

Snake.png.scaled1000.png

To Do

  • Create R script that :
    • Reads in multiple assemblies
    • Split scaffolds into contigs
    • Calculate metrics on scaffolds and contigs
    • Generates table of metrics of all assemblies
    • Generates graph

Resources

Assemblathon 2 basic assembly metrics (where the graph came from)

Examples in R to calculate metrics