Simulated Dataset Generation Script

From BioAssist
Jump to: navigation, search


  • [version 1]: create a dataset with accurate alignment position. (Done)
  • version 2: create a dataset with error profile and artificial SNP/indels. (Work in progress)

Meanwhile, we have the following datasets for testing.

  • A simulated illumina dataset. 2 million reads, 50 bases per read, all unique, and uniformly generated from entire human genome (NCBI build 36).
  • A Solid dataset with all SNP and Indel (1-5 bases) information. 125K bases. Human. Chromosome 2 and 7. This can be used as the golden standard for Solid alignment tool evaluation. There is also a script to create a simulated dataset in colorspace.