Velvet-Sapelo2
Category
Bioinformatics
Program On
Sapelo2
Version
1.2.10
Author / Distributor
Velvet: algorithms for de novo short read assembly using de Bruijn graphs. D.R. Zerbino and E. Birney. Genome Research 18:821-829
Description
Sequence assembler for very short reads. More information: http://www.ebi.ac.uk/~zerbino/velvet/
velvetg - de Bruijn graph construction, error removal and repeat resolution velveth - simple hashing program
Running Program
Also refer to Running Jobs on Sapelo2
Note: velvet is compiled in multi-thread (compiled with 'BIGASSEMBLY=1' 'LONGSEQUENCES=1' 'MAXKMERLENGTH=100' 'CATEGORIES=62' 'OPENMP=1' )
some long reads causes segment fault with high categories (e.g. CATEGORIES=99), we suggest using the fitting categories and kmer version for less memory.
- Version 1.2.10, installed in /usr/local/apps/eb/Velvet/1.2.10-foss-2016b-mt-kmer_100-Perl-5.24.1
To use this version of Velvet, please first load the module with
module load Velvet/1.2.10-foss-2016b-mt-kmer_100-Perl-5.24.1
Example of a shell script velvet.sh to run on at the batch queue:
#PBS -S /bin/bash #PBS -N j_velvet #PBS -q batch #PBS -l nodes=1:ppn=2:AMD #PBS -l walltime=480:00:00 #PBS -l mem=100gb cd $PBS_O_WORKDIR module load Velvet/1.2.10-foss-2016b-mt-kmer_100-Perl-5.24.1 export OMP_THREAD_LIMIT=2 export OMP_NUM_THREADS=2 time velveth myDirectory 21 -shortPaired data/test_reads.fa time velvetg myDirectory
In above sample, 2 in OMP_THREAD_LIMIT and OMP_NUM_THREADS are the number of threads to use. ppn number has to match "2" in OMP_THREAD_LIMIT and OMP_NUM_THREADS
Example of submission to the queue:
qsub velvet.sh
Velvet needs large memory to run.
For transcriptomic assembly, Velvet is extended by Oases.
Documentation
module load Velvet/1.2.10-foss-2016b-mt-kmer_100-Perl-5.24.1
velveth --help
velveth - simple hashing program
Version 1.2.10
Copyright 2007, 2008 Daniel Zerbino (zerbino@ebi.ac.uk)
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Compilation settings:
CATEGORIES = 62
MAXKMERLENGTH = 100
OPENMP
LONGSEQUENCES
BIGASSEMBLY
Usage:
./velveth directory hash_length {[-file_format][-read_type][-separate|-interleaved] filename1 [filename2 ...]} {...} [options]
directory : directory name for output files
hash_length : EITHER an odd integer (if even, it will be decremented) <= 100 (if above, will be reduced)
: OR: m,M,s where m and M are odd integers (if not, they will be decremented) with m < M <= 100 (if above, will be reduced)
and s is a step (even number). Velvet will then hash from k=m to k=M with a step of s
filename : path to sequence file or - for standard input
File format options:
-fasta -fastq -raw -fasta.gz -fastq.gz -raw.gz -sam -bam -fmtAuto
(Note: -fmtAuto will detect fasta or fastq, and will try the following programs for decompression : gunzip, pbunzip2, bunzip2
File layout options for paired reads (only for fasta and fastq formats):
-interleaved : File contains paired reads interleaved in the one file (default)
-separate : Read 2 separate files for paired reads
Read type options:
-short -shortPaired
...
-short61 -shortPaired61
-short62 -shortPaired62
-long -longPaired
-reference
Options:
-strand_specific : for strand specific transcriptome sequencing data (default: off)
-reuse_Sequences : reuse Sequences file (or link) already in directory (no need to provide original filenames in this case (default: off)
-reuse_binary : reuse binary sequences file (or link) already in directory (no need to provide original filenames in this case (default: off)
-noHash : simply prepare Sequences file, do not hash reads or prepare Roadmaps file (default: off)
-create_binary : create binary CnyUnifiedSeq file (default: off)
Synopsis:
- Short single end reads:
velveth Assem 29 -short -fastq s_1_sequence.txt
- Paired-end short reads (remember to interleave paired reads):
velveth Assem 31 -shortPaired -fasta interleaved.fna
- Paired-end short reads (using separate files for the paired reads)
velveth Assem 31 -shortPaired -fasta -separate left.fa right.fa
- Two channels and some long reads:
velveth Assem 43 -short -fastq unmapped.fna -longPaired -fasta SangerReads.fasta
- Three channels:
velveth Assem 35 -shortPaired -fasta pe_lib1.fasta -shortPaired2 pe_lib2.fasta -short3 se_lib1.fa
Output:
directory/Roadmaps
directory/Sequences
[Both files are picked up by graph, so please leave them there]
module load Velvet/1.2.10-foss-2016b-mt-kmer_100-Perl-5.24.1
velvetg --help
Usage:
./velvetg directory [options]
directory : working directory name
Standard options:
-cov_cutoff <floating-point|auto> : removal of low coverage nodes AFTER tour bus or allow the system to infer it
(default: no removal)
-ins_length <integer> : expected distance between two paired end reads (default: no read pairing)
-read_trkg <yes|no> : tracking of short read positions in assembly (default: no tracking)
-min_contig_lgth <integer> : minimum contig length exported to contigs.fa file (default: hash length * 2)
-amos_file <yes|no> : export assembly to AMOS file (default: no export)
-exp_cov <floating point|auto> : expected coverage of unique regions or allow the system to infer it
(default: no long or paired-end read resolution)
-long_cov_cutoff <floating-point>: removal of nodes with low long-read coverage AFTER tour bus
(default: no removal)
Advanced options:
-ins_length* <integer> : expected distance between two paired-end reads in the respective short-read dataset (default: no read pairing)
-ins_length_long <integer> : expected distance between two long paired-end reads (default: no read pairing)
-ins_length*_sd <integer> : est. standard deviation of respective dataset (default: 10% of corresponding length)
[replace '*' by nothing, '2' or '_long' as necessary]
-scaffolding <yes|no> : scaffolding of contigs used paired end information (default: on)
-max_branch_length <integer> : maximum length in base pair of bubble (default: 100)
-max_divergence <floating-point>: maximum divergence rate between two branches in a bubble (default: 0.2)
-max_gap_count <integer> : maximum number of gaps allowed in the alignment of the two branches of a bubble (default: 3)
-min_pair_count <integer> : minimum number of paired end connections to justify the scaffolding of two long contigs (default: 5)
-max_coverage <floating point> : removal of high coverage nodes AFTER tour bus (default: no removal)
-coverage_mask <int> : minimum coverage required for confident regions of contigs (default: 1)
-long_mult_cutoff <int> : minimum number of long reads required to merge contigs (default: 2)
-unused_reads <yes|no> : export unused reads in UnusedReads.fa file (default: no)
-alignments <yes|no> : export a summary of contig alignment to the reference sequences (default: no)
-exportFiltered <yes|no> : export the long nodes which were eliminated by the coverage filters (default: no)
-clean <yes|no> : remove all the intermediary files which are useless for recalculation (default : no)
-very_clean <yes|no> : remove all the intermediary files (no recalculation possible) (default: no)
-paired_exp_fraction <double> : remove all the paired end connections which less than the specified fraction of the expected count (default: 0.1)
-shortMatePaired* <yes|no> : for mate-pair libraries, indicate that the library might be contaminated with paired-end reads (default no)
-conserveLong <yes|no> : preserve sequences with long reads in them (default no)
Output:
directory/contigs.fa : fasta file of contigs longer than twice hash length
directory/stats.txt : stats file (tab-spaced) useful for determining appropriate coverage cutoff
directory/LastGraph : special formatted file with all the information on the final graph
directory/velvet_asm.afg : (if requested) AMOS compatible assembly file
Installation
source code download from http://www.ebi.ac.uk/~zerbino/velvet/
velvet is compiled in multi-thread (compiled with 'BIGASSEMBLY=1' 'LONGSEQUENCES=1' 'MAXKMERLENGTH=99' 'CATEGORIES=62' 'OPENMP=1' )
System
64-bit Linux