# Assembly metrics explained

From BioAssist

Return to the main page of Next Generation Sequencing

## Explanation of assembly metrics

- Runtime (in minutess)

- The time is takes for an assembly to run

- Max memory

- The maximum amount of RAM that the assembly needs to run

- Sequence length

- The total length in bases of the assembled data

- Largest contig

- The length in bases of the largest contig

- Number of contigs larger than 100

- The number of contigs , longer than 100 bases

- N50 length

- Ordering all contigs from longest to shortest until the summed length exceeds 50% of the total length of all sequences. The length of the latest added contig is caled the N50 length. In general, the larger the N50 length, the higher the quality of the assembled genome.

- N50 index

- Ordering all contigs from longest to shortest until the summed length exceeds 50% of the total length of all sequences. The index (count) of the latest added contig is the N50 index. In general, the smaller the N50 index, the higher the quality of the assembled genome.

- Number of mismatches to reference

- These are the number of GSNPs and Gindels that dnadiff from MUMmer 3.22 reports at standard settings

- Percentage of N's in consensus

- What percentage of the bases is unknown

- Percentage of unused reads in assembly

- How many reads are unused in the assembly

## To add

- N90 size and index
- Mean contig length
- Standard deviation of contig length
- Median contig length
- Number of contigs >=1kb
- Number of bases in contigs >=1kb
- GC Content of contigs
- Input genome size
- All metrics, also for scaffolds
- How many N's to split a scaffold into contigs (also generate histogram of N stretch length)
- Percentage of assembly in scaffolded contigs
- Number of contigs per scaffold (mean, median,stdev)
- Graph of genome size growth, with markers for N50 & N90

- Like this:

## To Do

- Create R script that :
- Reads in multiple assemblies
- Split scaffolds into contigs
- Calculate metrics on scaffolds and contigs
- Generates table of metrics of all assemblies
- Generates graph

## Resources

Assemblathon 2 basic assembly metrics (where the graph came from)