BionanoSolve-Sapelo2: Difference between revisions
No edit summary |
No edit summary |
||
Line 42: | Line 42: | ||
Sample job submission script (sub.sh) to run hybridScaffold.pl from BionanoSolve | Sample job submission script (sub.sh) to run hybridScaffold.pl from BionanoSolve v3.6.1-11162020 in a batch job: | ||
<pre class="gscript"> | <pre class="gscript"> | ||
Line 49: | Line 49: | ||
#SBATCH --partition=batch | #SBATCH --partition=batch | ||
#SBATCH --ntasks=1 | #SBATCH --ntasks=1 | ||
#SBATCH --cpus-per-task= | #SBATCH --cpus-per-task=2 | ||
#SBATCH --mem=20gb | #SBATCH --mem=20gb | ||
#SBATCH --time=120:00:00 | #SBATCH --time=120:00:00 | ||
Line 57: | Line 57: | ||
#SBATCH --mail-type=ALL | #SBATCH --mail-type=ALL | ||
cd $ | cd $SLURM_SUBMIT_DIR | ||
module load BionanoSolve/3.6.1-11162020-foss-2019b | |||
perl ${EBROOTBIONANOSOLVE}/HybridScaffold/11162020/hybridScaffold.pl [options] | |||
</pre> | |||
</pre> | |||
Sample job submission script (sub.sh) to run | Sample job submission script (sub.sh) to run pipelineCL.py from BionanoSolve v3.6.1-11162020 in a batch job: | ||
<pre class="gscript"> | <pre class="gscript"> | ||
# | #!/bin/bash | ||
# | #SBATCH --job-name=job_hybridScaffold | ||
# | #SBATCH --partition=batch | ||
# | #SBATCH --nodes=1 | ||
# | #SBATCH --ntasks=1 | ||
# | #SBATCH --cpus-per-task=32 | ||
# | #SBATCH --mem=40gb | ||
#SBATCH --time=120:00:00 | |||
#SBATCH --output=log.%j.out | |||
#SBATCH --error=log.%j.err | |||
#SBATCH --mail-user=username@uga.edu | |||
#SBATCH --mail-type=ALL | |||
cd $ | cd $SLURM_SUBMIT_DIR | ||
module load BionanoSolve/3.6.1-11162020-foss-2019b | |||
python ${EBROOTBIONANOSOLVE}/Pipeline/11162020/pipelineCL.py -T 32 -j 32 [options] | |||
python $EBROOTBIONANOSOLVE/Pipeline/ | |||
</pre> | </pre> | ||
where EBROOTBIONANOSOLVE is the environmental variable storing BionanoSolve installation path on cluster; [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well. | where EBROOTBIONANOSOLVE is the environmental variable storing BionanoSolve installation path on the cluster; [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well. | ||
'''Please note:''' BionanoSolve needs to use Distributed Resource Management Application API (DRMAA http://www.drmaa.org/) and a properly configured "clusterArgument.xml" file to run a distributed parallel job on cluster. Currently DRMAA is not available on Sapelo2 cluster; therefore, please run BionanoSolve pipelineCL.py on a single node. Please do not run the pipeline using "-C <cluster argument.xml file>" option. We are sorry for the inconvenience. | '''Please note:''' BionanoSolve needs to use Distributed Resource Management Application API (DRMAA http://www.drmaa.org/) and a properly configured "clusterArgument.xml" file to run a distributed parallel job on the cluster. Currently DRMAA is not available on Sapelo2 cluster; therefore, please run BionanoSolve pipelineCL.py on a single node. Please do not run the pipeline using "-C <cluster argument.xml file>" option. We are sorry for the inconvenience. | ||
Example of job submission | Example of job submission | ||
<pre class="gcommand"> | <pre class="gcommand"> | ||
sbatch sub.sh | |||
</pre> | </pre> | ||
Revision as of 09:46, 14 April 2021
Category
Bioinformatics
Program On
Sapelo2
Version
3.6.1-11162020
Author / Distributor
Details are at Bionano Solve
Description
From Bionano Solve: "Bionano Solve™ is an analysis pipeline for Bionano data processing, optimized for Bionano Compute and IrysSolve Compute Servers. A de novo assembly of a human genome can be completed in about 28 hours. Bionano Tools contains various tools and scripts, including the Bionano Solve analysis pipeline. These tools together perform computation jobs on Saphyr and IrysSolve Compute Servers."
Running Program
Also refer to Running Jobs on Sapelo2
For more information on Environment Modules on Sapelo2 please see the Lmod page.
- Version 3.6.1-11162020, installed in /apps/eb/BionanoSolve/3.6.1-11162020-foss-2019b
To use BionanoSolve v3.6.1-11162020 pipelines, please first load the module with
module load BionanoSolve/3.6.1-11162020-foss-2019b
Once you loaded the module, an environmental variable called EBROOTBIONANOSOLVE is exported. It stores BionanoSolve installation path on the cluster, i.e., /apps/eb/BionanoSolve/3.6.1-11162020-foss-2019b/ . Using EBROOTBIONANOSOLVE, BionanoSolve components can be easily found, for example:
Pipeline is at ${EBROOTBIONANOSOLVE}/Pipeline/11162020
HybridScaffold is at ${EBROOTBIONANOSOLVE}/HybridScaffold/11162020
RefAligner is at ${EBROOTBIONANOSOLVE}/RefAligner/11442.11643rel
VariantAnnotation is at ${EBROOTBIONANOSOLVE}/VariantAnnotation/11162020
FSHD is at ${EBROOTBIONANOSOLVE}/FSHD/11162020
Sample job submission script (sub.sh) to run hybridScaffold.pl from BionanoSolve v3.6.1-11162020 in a batch job:
#!/bin/bash #SBATCH --job-name=job_hybridScaffold #SBATCH --partition=batch #SBATCH --ntasks=1 #SBATCH --cpus-per-task=2 #SBATCH --mem=20gb #SBATCH --time=120:00:00 #SBATCH --output=log.%j.out #SBATCH --error=log.%j.err #SBATCH --mail-user=username@uga.edu #SBATCH --mail-type=ALL cd $SLURM_SUBMIT_DIR module load BionanoSolve/3.6.1-11162020-foss-2019b perl ${EBROOTBIONANOSOLVE}/HybridScaffold/11162020/hybridScaffold.pl [options]
Sample job submission script (sub.sh) to run pipelineCL.py from BionanoSolve v3.6.1-11162020 in a batch job:
#!/bin/bash #SBATCH --job-name=job_hybridScaffold #SBATCH --partition=batch #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=32 #SBATCH --mem=40gb #SBATCH --time=120:00:00 #SBATCH --output=log.%j.out #SBATCH --error=log.%j.err #SBATCH --mail-user=username@uga.edu #SBATCH --mail-type=ALL cd $SLURM_SUBMIT_DIR module load BionanoSolve/3.6.1-11162020-foss-2019b python ${EBROOTBIONANOSOLVE}/Pipeline/11162020/pipelineCL.py -T 32 -j 32 [options]
where EBROOTBIONANOSOLVE is the environmental variable storing BionanoSolve installation path on the cluster; [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.
Please note: BionanoSolve needs to use Distributed Resource Management Application API (DRMAA http://www.drmaa.org/) and a properly configured "clusterArgument.xml" file to run a distributed parallel job on the cluster. Currently DRMAA is not available on Sapelo2 cluster; therefore, please run BionanoSolve pipelineCL.py on a single node. Please do not run the pipeline using "-C <cluster argument.xml file>" option. We are sorry for the inconvenience.
Example of job submission
sbatch sub.sh
Documentation
Details are at BionanoSolve
A user manual for running BionanoSolve pipeline on command-line can be found at BionanoSolve guide
module load BionanoSolve/3.2.2-08222018-foss-2016b perl $EBROOTBIONANOSOLVE/HybridScaffold/08222018/hybridScaffold.pl -h Usage: perl hybridScaffold.pl <-h> <-n ngs_file> <-b bng_cmap_file> <-c hybrid_config_xml> <-o output_folder> <-B conflict_filter_level> <-N conflict_filter_level> <-f> <-m molecules_bnx> <-p de_novo_pipeline> <-q de_novo_xml> <-v> <-x> <-y> <-e noise_param><-z tar_zip_file><-S> -h : This help message -n : Input NGS FASTA [required] -b : Input BioNano CMAP [required] -c : Merge configuration file [required] -o : Output folder [required] -r : RefAligner program [required] -B : conflict filter level: 1 no filter, 2 cut contig at conflict, 3 exclude conflicting contig [required if not using -M option] -N : conflict filter level: 1 no filter, 2 cut contig at conflict, 3 exclude conflicting contig [required if not using -M option] -f : Force output and overwrite any existing files -x : Flag to generate molecules to hybrid scaffold alignment and molecules to genome map alignment [optional] -y : Flag to generate chimeric quality score for the Input BioNano CMAP [optional] -m : Input BioNano molecules BNX [optional; only required for either the -x or -y option] -p : Input de novo assembly pipeline directory [optional; only required for -x option] -q : Input de novo assembly pipeline optArguments XML script [optional; only required for -x option] -e : Input de novo assembly noise parameter .errbin or .err file [optional; recommended for -y option but not required] -v : Print pipeline version information -M : Input a conflict resolution file indicating which NGS and BioNano conflicting contigs to be cut [optional] -z : Name of a zipped file to archive the essential output files [optional] -S : Only run hybridScaffold up to before Merge steps [optional] -w : Name of the status text file needed by IrysView [optional] -t : Perform pre-pairmerge sequence to pre-pairmerge genome map alignment [optional] -u : Sequence of enzyme recognition site (overrides what has been specified in config XML file, for IrysView only) [optional] python $EBROOTBIONANOSOLVE/Pipeline/08222018/pipelineCL.py -h usage: pipelineCL.py [-h] [-T T] [-j MAXTHREADS] [-jp MAXTHREADSPW] [-N N] [-G BED] [-i ITER] [-b BNX] [-l LOCAL] [-t TOOLS] [-B BYPASS] [-e EXP] [-r REF] [-x] [-c CLEANUP] [-C CXML] [-w] [-a XML] [-p PERF] [-d] [-u] [-U [GROUPCONTIGS]] [-v [VERSION]] [-V RUNSV] [-A] [-y] [-m] [-H [H]] [-f [F]] [-J J] [-z] [-E] [-W W] [-F F] [-R [R]] [-cd CONTROLDIR] [-pd PARAMDIR] [-op OUTLIERP] Pipeline for de novo assembly - BioNano Genomics optional arguments: -h, --help show this help message and exit -T T Available threads per Node [default 1] -j MAXTHREADS Max Threads per job [default -T] -jp MAXTHREADSPW Max Threads per pairwise or stage0 job [default -T arg] -N N Minimum number of split bnx files (actual number is multiple of this) (optional, default 2) -G BED Bed file for gaps, used in structural variation (SV) detection to check for SV overlap with reference gaps -i ITER Number of extension and merge iterations (default=1, must be in range [0,20], use 0 to skip) -b BNX Input molecule (.bnx) file, required -l LOCAL Location of output files root directory, required, will be created if does not exist; if does exist, will overwrite contents (may be error-prone) -t TOOLS Location of executable files (RefAligner and Assembler, required) -B BYPASS Skip steps, using previous result. <= 0:None, 1:ImgDetect, 2:NoiseChar/Subsample, 3:Pairwise, 4:Assembly, 5:RefineA, 6:RefineB, 7:merge0, 8+(i-1)*2:Ext(i), 9+(i-1)*2:Mrg(i), N+1:alignmol -e EXP Output file prefix (optional, default = exp) -r REF Reference file (must be .cmap), to compare resulting contigs (optional) -x Exit after auto noise (noise characterization), do not preform de novo assembly -c CLEANUP Remove contig results (0 - keep all (default), 1 - remove intermediate files, 2 - store in sqlite, 3 - store in sqlite and remove) -C CXML Run on cluster, read XML file for submission arguments (optional--will not use cluster submission if absent) -w Wipe clean previous contig results -a XML Read XML file for parameters (required) -p PERF Log performance in pipelineReport 0=None, 1=time, 2=perf, 3=time&perf (default=1) -d Retired option (contig subdirectories), always enabled. -u Do not perform final refinement (not recommended). -U [GROUPCONTIGS] Group contigs in refinement and extension stages: always ON, this argument has no effect (retained for backward compatibility) -v [VERSION] Print version; exit if argument > 1 supplied. -V RUNSV Detect structural variations. Default: run after final stage (normally refineFinal); if argument 0, disable. -A Align molcules to final contigs (ON by default, use this to turn off). -y Automatically determine noise parameters (requires reference; optional, default off) -m Disable molecule vs reference alignments (default on with reference) -H [H] Use HG19 (human genome) as reference, loaded from Analysis/SV/CopyNumberProfiles/DATA. Overrides -r argument. Use HG38 if argument 2 is supplied. [Default OFF] -f [F] Run this fraction of grouped jobs on host (0.2 if no arg) [default 0] -J J Number of threads on host for grouped jobs (has no effect without -f) [default 48] -z Zip pipeline results (default ON, use this to turn off). -E ReCheck stdout completeness for completed jobs (default ON, use this to turn off). -W W Multiply group sizes by this factor to reduce number of jobs (for Genomes larger than Human). Use value under 1 to increase number of jobs -F F Color channel: replace -usecolor X in optArgs with this, must be either 1 or 2 [default OFF] -R [R] Rough assembly: denovo assembly used as autoNoise reference for re-assembly; sequence may be optionally specified as -r, if supplied, will be used for global scaling and SV calls, but not for autoNoise (optional, default off, optional argument 0.2-0.9 for fraction of rough assembly which must align to sequence for rescaling) -cd CONTROLDIR Control data directory for copy number profiles [optional] -pd PARAMDIR Parameters directory for copy number profiles [optional] -op OUTLIERP Outlier Probability for copy number profile (optional)
Installation
source code from BionanoSolve download
System
64-bit Linux