BionanoSolve-Sapelo2
Category
Bioinformatics
Program On
Sapelo2
Version
3.6.1-11162020
Author / Distributor
Details are at Bionano Solve
Description
From Bionano Solve: "Bionano Solve™ is an analysis pipeline for Bionano data processing, optimized for Bionano Compute and IrysSolve Compute Servers. A de novo assembly of a human genome can be completed in about 28 hours. Bionano Tools contains various tools and scripts, including the Bionano Solve analysis pipeline. These tools together perform computation jobs on Saphyr and IrysSolve Compute Servers."
Running Program
Also refer to Running Jobs on Sapelo2
For more information on Environment Modules on Sapelo2 please see the Lmod page.
- Version 3.6.1-11162020, installed in /apps/eb/BionanoSolve/3.6.1-11162020-foss-2019b
ml To use BionanoSolve v3.6.1-11162020 pipelines, please first load the module with
module load BionanoSolve/3.6.1-11162020-foss-2019b
Once you loaded the module, an environmental variable called EBROOTBIONANOSOLVE is exported. It stores BionanoSolve installation path on the cluster, i.e., /apps/eb/BionanoSolve/3.6.1-11162020-foss-2019b/ . Using EBROOTBIONANOSOLVE, BionanoSolve components can be easily found, for example:
Pipeline is at ${EBROOTBIONANOSOLVE}/Pipeline/11162020
HybridScaffold is at ${EBROOTBIONANOSOLVE}/HybridScaffold/11162020
RefAligner is at ${EBROOTBIONANOSOLVE}/RefAligner/11442.11643rel
VariantAnnotation is at ${EBROOTBIONANOSOLVE}/VariantAnnotation/11162020
FSHD is at ${EBROOTBIONANOSOLVE}/FSHD/11162020
Sample job submission script (sub.sh) to run hybridScaffold.pl from BionanoSolve v3.6.1-11162020 in a batch job:
#!/bin/bash #SBATCH --job-name=job_hybridScaffold #SBATCH --partition=batch #SBATCH --ntasks=1 #SBATCH --cpus-per-task=2 #SBATCH --mem=10gb #SBATCH --time=120:00:00 #SBATCH --output=log.%j.out #SBATCH --error=log.%j.err #SBATCH --mail-user=username@uga.edu #SBATCH --mail-type=ALL cd $SLURM_SUBMIT_DIR module load BionanoSolve/3.6.1-11162020-foss-2019b perl ${EBROOTBIONANOSOLVE}/HybridScaffold/11162020/hybridScaffold.pl [options]
Sample job submission script (sub.sh) to run pipelineCL.py from BionanoSolve v3.6.1-11162020 in a batch job:
#!/bin/bash #SBATCH --job-name=job_hybridScaffold #SBATCH --partition=batch #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=32 #SBATCH --mem=40gb #SBATCH --time=120:00:00 #SBATCH --output=log.%j.out #SBATCH --error=log.%j.err #SBATCH --mail-user=username@uga.edu #SBATCH --mail-type=ALL cd $SLURM_SUBMIT_DIR module load BionanoSolve/3.6.1-11162020-foss-2019b python ${EBROOTBIONANOSOLVE}/Pipeline/11162020/pipelineCL.py -T 32 -j 32 [options]
where EBROOTBIONANOSOLVE is the environmental variable storing BionanoSolve installation path on the cluster; [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.
Please note: BionanoSolve needs to use Distributed Resource Management Application API (DRMAA http://www.drmaa.org/) and a properly configured "clusterArgument.xml" file to run a distributed parallel job on the cluster. Currently DRMAA is not available on Sapelo2 cluster; therefore, please run BionanoSolve pipelineCL.py on a single node. Please do not run the pipeline using "-C <cluster argument.xml file>" option. We are sorry for the inconvenience.
Example of job submission
sbatch sub.sh
Documentation
Details are at BionanoSolve
A user manual for running BionanoSolve pipeline on command-line can be found at BionanoSolve guide
ml BionanoSolve/3.6.1-11162020-foss-2019b perl ${EBROOTBIONANOSOLVE}/HybridScaffold/11162020/hybridScaffold.pl -h Usage: perl hybridScaffold.pl <-h> <-n ngs_file> <-b bng_cmap_file> <-c hybrid_config_xml> <-o output_folder> <-B conflict_filter_level> <-N conflict_filter_level> <-f> <-m molecules_bnx> <-p de_novo_pipeline> <-q de_novo_xml> <-v> <-x> <-y> <-e noise_param><-z tar_zip_file><-S> -h : This help message -n : Input NGS FASTA [required] -b : Input BioNano CMAP [required] -c : Merge configuration file [required] -o : Output folder [required] -r : RefAligner program [required] -B : conflict filter level: 1 no filter, 2 cut contig at conflict, 3 exclude conflicting contig [required if not using -M option] -N : conflict filter level: 1 no filter, 2 cut contig at conflict, 3 exclude conflicting contig [required if not using -M option] -f : Force output and overwrite any existing files -x : Flag to generate molecules to hybrid scaffold alignment and molecules to genome map alignment [optional] -y : Flag to generate chimeric quality score for the Input BioNano CMAP [optional] -m : Input BioNano molecules BNX [optional; only required for either the -x or -y option] -p : Input de novo assembly pipeline directory [optional; only required for -x option] -q : Input de novo assembly pipeline optArguments XML script [optional; only required for -x option] -e : Input de novo assembly noise parameter .errbin or .err file [optional; recommended for -y option but not required] -v : Print pipeline version information -M : Input a conflict resolution file indicating which NGS and BioNano conflicting contigs to be cut [optional] -z : Name of a zipped file to archive the essential output files [optional] -S : Only run hybridScaffold up to before Merge steps [optional] -w : Name of the status text file needed by IrysView [optional] -t : Perform pre-pairmerge sequence to pre-pairmerge genome map alignment [optional] -u : Sequence of enzyme recognition site (overrides what has been specified in config XML file, for IrysView only) [optional] python ${EBROOTBIONANOSOLVE}/Pipeline/11162020/pipelineCL.py -h usage: pipelineCL.py [-h] [-T T] [-j MAXTHREADS] [-je MAXTHREADSEXT] [-jp MAXTHREADSPW] [-J J] [-TJ TJ] [-Tp TP] [-Te TE] [-Tn TN] [-N N] [-G BED] [-i ITER] [-b BNX] [-l LOCAL] [-t TOOLS] [-B BYPASS] [-e EXP] [-r REF] [-x] [-c CLEANUP] [-C CXML] [-w] [-a XML] [-p PERF] [-d] [-u] [-U [GROUPCONTIGS]] [-v [VERSION]] [-V RUNSV] [-A] [-y] [-m] [-f [F]] [-z] [-E] [-W W] [-Gsiz GSIZ] [-F F] [-R [R]] [-cd CONTROLDIR] [-pd PARAMDIR] [-op OUTLIERP] [-cr CONTROL_BASELINE_FILE] [-cm CNV_MASK_FILE] [-ce CHR_EXPECTED_CNS_FILE] [-json JSON] [-guided] [-guidedB] [-seed SEED] [-finalmergeSV] [-cnvOnly] [-NoCheckFiles] [-NoExtCharCheck] [--vapini VAP_INI] [--cleanRestart] [--autoRestart] [--dynamicExtension DYNAMICEXTENSION] [--docker] [--experimental [EXPERIMENTAL [EXPERIMENTAL ...]]] [--compute-confidence COMPUTE_CONFIDENCE] Pipeline for de novo assembly - Bionano Genomics optional arguments: -h, --help show this help message and exit -T T Total threads per Node, with overloading [default 1] -j MAXTHREADS Max Threads per job [default -T arg] -je MAXTHREADSEXT Max Threads per extension stage1 job (if less than -j value) [default 60] -jp MAXTHREADSPW Max Threads per pairwise or stage0 job [default -T arg] -J J Threads per large memory host jobs (mediumHostJob in clusterArguments.xml) for grouped jobs (has no effect without -f) [default 48] -TJ TJ Total threads per Node, with overloading, for large memory hosts (see -J) [default 2x -J value] -Tp TP Total threads per Node, with overloading, for pairwise jobs [default 2x -jp value] -Te TE Total threads per Node, with overloading, for extension jobs [default -T value] -Tn TN Nominal threads per Node, without overloading (non- zero value will override -T -Tp -Te -TJ) [default 0] -N N Minimum number of split bnx files (actual number is multiple of this). Value of 6 required (reserved) for Xeon-Phi hardware (optional, default 2) -G BED Bed file for gaps, used in structural variation (SV) detection to check for SV overlap with reference gaps -i ITER Number of extension and merge iterations (default=1, must be in range [0,20], use 0 to skip) -b BNX Input molecule (.bnx) file, required -l LOCAL Location of output files root directory, required, will be created if does not exist; if does exist, will overwrite contents (may be error-prone) -t TOOLS Location of executable files (RefAligner and Assembler, required) -B BYPASS Skip steps, using previous result. <= 0:None, 1:ImgDetect, 2:NoiseChar/Subsample, 3:Pairwise, 4:Assembly, 5:RefineA, 6:RefineB, 7:merge0, 8+(i-1)*2:Ext(i), 9+(i-1)*2:Mrg(i), N+1:alignmol -e EXP Output file prefix (optional, default = exp) -r REF Reference file (must be .cmap), to compare resulting contigs (optional) -x Exit after auto noise (noise characterization), do not preform de novo assembly -c CLEANUP Remove contig results (0 - keep all (default), 1 - remove intermediate files, 2 - store in sqlite, 3 - store in sqlite and remove) -C CXML Run on cluster, read XML file for submission arguments (optional--will not use cluster submission if absent) -w Wipe clean previous contig results -a XML Read XML file for parameters (required) -p PERF Log performance in pipelineReport 0=None, 1=time, 2=perf, 3=time&perf (default=1) -d Retired option (contig subdirectories), always enabled. -u Do not perform final refinement (not recommended). -U [GROUPCONTIGS] Group contigs in refinement and extension stages: always ON, this argument has no effect (retained for backward compatibility) -v [VERSION] Print version; exit if argument > 1 supplied. -V RUNSV Detect structural variations. Default: run after final stage (normally refineFinal); if argument 0, disable. -A Align molcules to final contigs (ON by default, use this to turn off). -y Automatically determine noise parameters (requires reference; optional, default off) -m Disable molecule vs reference alignments (default on with reference) -f [F] Run this fraction of grouped jobs on host (0.2 if no arg) [default 0] -z Zip pipeline results (default ON, use this to turn off). -E ReCheck stdout completeness for completed jobs (default ON, use this to turn off). -W W Multiply group sizes and BNX split sizes by this factor to reduce number of jobs (for Genomes larger than Human). Use value under 1 to increase number of jobs -Gsiz GSIZ Estimated Genome size in Gb -F F Color channel: replace -usecolor X in optArgs with this, must be either 1 or 2 [default OFF] -R [R] Rough assembly: denovo assembly used as autoNoise reference for re-assembly; sequence may be optionally specified as -r, if supplied, will be used for global scaling and SV calls, but not for autoNoise (optional, default off, optional argument 0.2-0.9 for fraction of rough assembly which must align to sequence for rescaling) -cd CONTROLDIR Control data directory for copy number profiles [optional] -pd PARAMDIR Parameters directory for copy number profiles [optional] -op OUTLIERP Outlier Probability for copy number profile (optional) -cr CONTROL_BASELINE_FILE Control CNV baseline reference file for copy number profiles [optional] -cm CNV_MASK_FILE CNV mask file for copy number profiles [optional] -ce CHR_EXPECTED_CNS_FILE Expected copy numbers file for copy number profiles [optional] -json JSON json string that contains different, informative parameters -guided Guided assembly: requires reference or -seed and skips pairwise,Assembly,refineA -guidedB Guided assembly: requires reference or -seed and skips pairwise,Assembly,refineA,refineB,mrg0 -seed SEED Seed Genome for Guided assembly (must be .cmap) -finalmergeSV Detect SVs after final merge stage -cnvOnly Exit after auto noise, alignmolvref, and CNV analysis; do not preform de novo assembly -NoCheckFiles Disable checking presence of all files mentioned in stdout files -NoExtCharCheck Disable checking contig sizes after extension characterize stage --vapini VAP_INI Variant annotation INI file --cleanRestart Remove existing ouput from current stage before rerun --autoRestart Retart pipeline from the stage where it left off --dynamicExtension DYNAMICEXTENSION Automatically determine the optimal number of iterations of extension and merge. Value specifies max number of iterations [default 0 (disable)] --docker Run jobs in docker container --experimental [EXPERIMENTAL [EXPERIMENTAL ...]] Run experimental features. Multiple values (separated by spaces) are possible. No experimental features in v3.6 --compute-confidence COMPUTE_CONFIDENCE Compute new confidence scores (v3.6 and later). Possible values: human_hg38/human_hg19/non_human. Default is to keep old scores
Installation
source code from BionanoSolve download
System
64-bit Linux