Simulated Dataset Generation Script
- [version 1]: create a dataset with accurate alignment position. (Done)
- version 2: create a dataset with error profile and artificial SNP/indels. (Work in progress)
Meanwhile, we have the following datasets for testing.
- A simulated illumina dataset. 2 million reads, 50 bases per read, all unique, and uniformly generated from entire human genome (NCBI build 36).
- A Solid dataset with all SNP and Indel (1-5 bases) information. 125K bases. Human. Chromosome 2 and 7. This can be used as the golden standard for Solid alignment tool evaluation. There is also a script to create a simulated dataset in colorspace.