Next Generation Sequencing: TipsTricks

From BioAssist
Jump to: navigation, search

Return to the main page of Next Generation Sequencing

454 package: sfffile versus runAssembly

The MID sorting of sfffile and runAssembly work in different way !.

This won't give much problems when the edit distance between MIDs is larger than 2 bases, but nevertheless you will see differences when you compare:

sfffile -mcf MIDConfig.parse -s your_sff_file.sff
runAssembly -o assembly_midX 454Reads.MIDX.sff

with

runAssembly -o assembly_midX -mcf MIDConfig.parse MIDX@your_sff_file.sff

sfffile checks which MID is best for the read, runAssembly will add a read to the list when the read has at most 2 mismatches with the MID.

Cabog and titanium reads

FLX/Titanium and shotgun/paired-end reads need different settings in Cabog. (SOP at wgs-assembler website)

Concatenate many gzip files into one big gzip file without internal headers

Running Blast or InterPro on the Dutch lifescience grid may result in many thousands of gzipped XML files. Normally it is possible to concatenate gzip files using 'cat' resulting in one big file containing internal gzip headers. Gzip itself handles this perfectly, but some tools like the taxonomy mapper MEGAN cannot handle these concatenated gzipped files. Using the command below you can recompress the files without the need for lost of storage into one big gzip file without internal gzip headers.

gzip -vcd <directory containing the gzip files>/*.gz | gzip > <your new gzip file>.gz

Note: do not place the new gzip file in the same directory or use an alternative file extension.