Next Generation Sequencing: TipsTricks
Return to the main page of Next Generation Sequencing
454 package: sfffile versus runAssembly
The MID sorting of sfffile and runAssembly work in different way !.
This won't give much problems when the edit distance between MIDs is larger than 2 bases, but nevertheless you will see differences when you compare:
sfffile -mcf MIDConfig.parse -s your_sff_file.sff runAssembly -o assembly_midX 454Reads.MIDX.sff
runAssembly -o assembly_midX -mcf MIDConfig.parse MIDX@your_sff_file.sff
sfffile checks which MID is best for the read, runAssembly will add a read to the list when the read has at most 2 mismatches with the MID.
Cabog and titanium reads
FLX/Titanium and shotgun/paired-end reads need different settings in Cabog. (SOP at wgs-assembler website)
Concatenate many gzip files into one big gzip file without internal headers
Running Blast or InterPro on the Dutch lifescience grid may result in many thousands of gzipped XML files. Normally it is possible to concatenate gzip files using 'cat' resulting in one big file containing internal gzip headers. Gzip itself handles this perfectly, but some tools like the taxonomy mapper MEGAN cannot handle these concatenated gzipped files. Using the command below you can recompress the files without the need for lost of storage into one big gzip file without internal gzip headers.
gzip -vcd <directory containing the gzip files>/*.gz | gzip > <your new gzip file>.gz
Note: do not place the new gzip file in the same directory or use an alternative file extension.