BionanoSolve-Sapelo2: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
(Created page with "Category:Sapelo2oldCategory:SoftwareCategory:Bioinformatics === Category === Bioinformatics === Program On === Sapelo2 === Version === 3.2.1-04122018, 3.2.2-0...")
 
No edit summary
 
(4 intermediate revisions by one other user not shown)
Line 1: Line 1:
[[Category:Sapelo2old]][[Category:Software]][[Category:Bioinformatics]]   
[[Category:Sapelo2]][[Category:Software]][[Category:Bioinformatics]]   


=== Category ===
=== Category ===
Line 8: Line 8:


=== Version ===
=== Version ===
3.2.1-04122018, 3.2.2-08222018, 3.3-10252018, 3.4-06042019
3.6.1-11162020


=== Author / Distributor ===
=== Author / Distributor ===
Line 21: Line 21:
For more information on Environment Modules on Sapelo2 please see the [[Lmod]] page.
For more information on Environment Modules on Sapelo2 please see the [[Lmod]] page.


*Version 3.2.1-04122018, installed in /usr/local/apps/eb/BionanoSolve/3.2.1-04122018-foss-2016b
*Version 3.6.1-11162020, installed in /apps/eb/BionanoSolve/3.6.1-11162020-foss-2019b
*Version 3.2.2-08222018, installed in /usr/local/apps/eb/BionanoSolve/3.2.2-08222018-foss-2016b
ml
*Version 3.3-10252018, installed in /usr/local/apps/eb/BionanoSolve/3.3-10252018-foss-2016b
To use BionanoSolve v3.6.1-11162020 pipelines, please first load the module with
*Version 3.4-06042019, installed in /usr/local/apps/eb/BionanoSolve/3.4-06042019-foss-2016b
 
To use BionanoSolve/3.2.1-04122018 pipelines, please first load the module with
<pre class="gscript">
<pre class="gscript">
module load BionanoSolve/3.2.1-04122018-foss-2016b
module load BionanoSolve/3.6.1-11162020-foss-2019b
</pre>
</pre>


To use BionanoSolve/3.2.2-08222018 pipelines, please first load the module with
Once you loaded the module, an environmental variable called EBROOTBIONANOSOLVE is exported. It stores BionanoSolve installation path on the cluster, i.e., /apps/eb/BionanoSolve/3.6.1-11162020-foss-2019b/ . Using EBROOTBIONANOSOLVE, BionanoSolve components can be easily found, for example:
<pre class="gscript">
module load BionanoSolve/3.2.2-08222018-foss-2016b
</pre>


To use BionanoSolve/3.3-10252018 pipelines, please first load the module with
<pre class="gscript">
module load BionanoSolve/3.3-10252018-foss-2016b
</pre>


To use BionanoSolve/3.4-06042019 pipelines, please first load the module with
Pipeline is at ${EBROOTBIONANOSOLVE}/Pipeline/11162020
<pre class="gscript">
module load BionanoSolve/3.4-06042019-foss-2016b
</pre>


Once you loaded the module, an environmental variable called EBROOTBIONANOSOLVE is created for storing BionanoSolve installation path on cluster (i.e. /usr/local/apps/eb/BionanoSolve/3.2.1-04122018-foss-2016b for version 3.2.1-04122018; /usr/local/apps/eb/BionanoSolve/3.2.2-08222018-foss-2016b for version 3.2.2-08222018; /usr/local/apps/eb/BionanoSolve/3.3-10252018-foss-2016b for version 3.2.2-08222018). Using EBROOTBIONANOSOLVE, BionanoSolve components can be easily found, for example:
HybridScaffold is at ${EBROOTBIONANOSOLVE}/HybridScaffold/11162020


*Version 3.2.1-04122018:
RefAligner is at ${EBROOTBIONANOSOLVE}/RefAligner/11442.11643rel


Pipeline is put in $EBROOTBIONANOSOLVE/Pipeline/04122018
VariantAnnotation is at ${EBROOTBIONANOSOLVE}/VariantAnnotation/11162020


HybridScaffold is put in $EBROOTBIONANOSOLVE/HybridScaffold/04122018
FSHD is at ${EBROOTBIONANOSOLVE}/FSHD/11162020


RefAligner is put in $EBROOTBIONANOSOLVE/RefAligner/7437.7523rel


VariantAnnotation is put in $EBROOTBIONANOSOLVE/VariantAnnotation/04122018
Sample job submission script (sub.sh) to run hybridScaffold.pl from BionanoSolve v3.6.1-11162020 in a batch job:


*Version 3.2.2-08222018:
<pre class="gscript">
#!/bin/bash
#SBATCH --job-name=job_hybridScaffold     
#SBATCH --partition=batch           
#SBATCH --ntasks=1                 
#SBATCH --cpus-per-task=2       
#SBATCH --mem=10gb                   
#SBATCH --time=120:00:00         
#SBATCH --output=log.%j.out   
#SBATCH --error=log.%j.err         
#SBATCH --mail-user=username@uga.edu 
#SBATCH --mail-type=ALL 


Pipeline is put in $EBROOTBIONANOSOLVE/Pipeline/08222018
cd $SLURM_SUBMIT_DIR


HybridScaffold is put in $EBROOTBIONANOSOLVE/HybridScaffold/08222018
module load BionanoSolve/3.6.1-11162020-foss-2019b


RefAligner is put in $EBROOTBIONANOSOLVE/RefAligner/7782.7865rel
perl ${EBROOTBIONANOSOLVE}/HybridScaffold/11162020/hybridScaffold.pl [options]
 
VariantAnnotation is put in $EBROOTBIONANOSOLVE/VariantAnnotation/08222018
 
*Version 3.3-10252018:
 
Pipeline is put in $EBROOTBIONANOSOLVE/Pipeline/10252018
 
HybridScaffold is put in $EBROOTBIONANOSOLVE/HybridScaffold/10252018
 
RefAligner is put in $EBROOTBIONANOSOLVE/RefAligner/7915.7989rel
 
VariantAnnotation is put in $EBROOTBIONANOSOLVE/VariantAnnotation/10252018
 
*Version 3.4-06042019:
 
Pipeline is put in $EBROOTBIONANOSOLVE/Pipeline/06042019
 
HybridScaffold is put in $EBROOTBIONANOSOLVE/HybridScaffold/06042019
 
RefAligner is put in $EBROOTBIONANOSOLVE/RefAligner/8949.9232rel
 
VariantAnnotation is put in $EBROOTBIONANOSOLVE/VariantAnnotation/06042019
 
 
Sample job submission script (sub.sh) to run hybridScaffold.pl from BionanoSolve/3.2.1-04122018 in a batch job:
 
<pre class="gscript">
#PBS -S /bin/bash
#PBS -q batch
#PBS -N job_hybridScaffold
#PBS -l nodes=1:ppn=2
#PBS -l walltime=12:00:00
#PBS -l mem=10g
#PBS -j oe
 
cd $PBS_O_WORKDIR
module load BionanoSolve/3.2.1-04122018-foss-2016b
perl $EBROOTBIONANOSOLVE/HybridScaffold/04122018/hybridScaffold.pl [options]
</pre>  
</pre>  


Sample job submission script (sub.sh) to run pipelineCL.py from BionanoSolve/3.2.1-04122018 in a batch job:
Sample job submission script (sub.sh) to run pipelineCL.py from BionanoSolve v3.6.1-11162020 in a batch job:
 
<pre class="gscript">
#PBS -S /bin/bash
#PBS -q batch
#PBS -N job_hybridScaffold
#PBS -l nodes=1:ppn=48
#PBS -l walltime=12:00:00
#PBS -l mem=10g
#PBS -j oe
 
cd $PBS_O_WORKDIR
ml BionanoSolve/3.2.1-04122018-foss-2016b
python $EBROOTBIONANOSOLVE/Pipeline/04122018/pipelineCL.py -T 48 -j 48 [options]
</pre>
 
Sample job submission script (sub.sh) to run hybridScaffold.pl from BionanoSolve/3.2.2-08222018 in a batch job:


<pre class="gscript">
<pre class="gscript">
#PBS -S /bin/bash
#!/bin/bash
#PBS -q batch
#SBATCH --job-name=job_hybridScaffold     
#PBS -N job_hybridScaffold
#SBATCH --partition=batch
#PBS -l nodes=1:ppn=2
#SBATCH --nodes=1           
#PBS -l walltime=12:00:00
#SBATCH --ntasks=1                
#PBS -l mem=10g
#SBATCH --cpus-per-task=32       
#PBS -j oe
#SBATCH --mem=40gb                   
 
#SBATCH --time=120:00:00          
cd $PBS_O_WORKDIR
#SBATCH --output=log.%j.out   
module load BionanoSolve/3.2.2-08222018-foss-2016b
#SBATCH --error=log.%j.err         
perl $EBROOTBIONANOSOLVE/HybridScaffold/08222018/hybridScaffold.pl [options]
#SBATCH --mail-user=username@uga.edu 
</pre>
#SBATCH --mail-type=ALL 


Sample job submission script (sub.sh) to run pipelineCL.py from BionanoSolve/3.2.2-08222018 in a batch job:
cd $SLURM_SUBMIT_DIR


<pre class="gscript">
module load BionanoSolve/3.6.1-11162020-foss-2019b
#PBS -S /bin/bash
#PBS -q batch
#PBS -N job_hybridScaffold
#PBS -l nodes=1:ppn=48
#PBS -l walltime=12:00:00
#PBS -l mem=10g
#PBS -j oe


cd $PBS_O_WORKDIR
python ${EBROOTBIONANOSOLVE}/Pipeline/11162020/pipelineCL.py -T 32 -j 32 [options]
module load BionanoSolve/3.2.2-08222018-foss-2016b
python $EBROOTBIONANOSOLVE/Pipeline/08222018/pipelineCL.py -T 48 -j 48 [options]
</pre>
</pre>


where EBROOTBIONANOSOLVE is the environmental variable storing BionanoSolve installation path on cluster; [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.
where EBROOTBIONANOSOLVE is the environmental variable storing BionanoSolve installation path on the cluster; [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.


'''Please note:''' BionanoSolve needs to use Distributed Resource Management Application API (DRMAA http://www.drmaa.org/) and a properly configured "clusterArgument.xml" file to run a distributed parallel job on cluster. Currently DRMAA is not available on Sapelo2 cluster; therefore, please run BionanoSolve pipelineCL.py on a single node. Please do not run the pipeline using "-C <cluster argument.xml file>" option. We are sorry for the inconvenience.
'''Please note:''' BionanoSolve needs to use Distributed Resource Management Application API (DRMAA http://www.drmaa.org/) and a properly configured "clusterArgument.xml" file to run a distributed parallel job on the cluster. Currently DRMAA is not available on Sapelo2 cluster; therefore, please run BionanoSolve pipelineCL.py on a single node. Please do not run the pipeline using "-C <cluster argument.xml file>" option. We are sorry for the inconvenience.
   
   


Example of job submission
Example of job submission
<pre  class="gcommand">
<pre  class="gcommand">
qsub sub.sh  
sbatch sub.sh  
</pre>
</pre>


Line 170: Line 104:


<pre  class="gcommand">
<pre  class="gcommand">
module load BionanoSolve/3.2.2-08222018-foss-2016b
ml BionanoSolve/3.6.1-11162020-foss-2019b
perl $EBROOTBIONANOSOLVE/HybridScaffold/08222018/hybridScaffold.pl -h
perl ${EBROOTBIONANOSOLVE}/HybridScaffold/11162020/hybridScaffold.pl -h
Usage: perl hybridScaffold.pl <-h> <-n ngs_file> <-b bng_cmap_file> <-c hybrid_config_xml> <-o output_folder> <-B conflict_filter_level> <-N conflict_filter_level> <-f>  
Usage: perl hybridScaffold.pl <-h> <-n ngs_file> <-b bng_cmap_file> <-c hybrid_config_xml> <-o output_folder> <-B conflict_filter_level> <-N conflict_filter_level> <-f>  
Line 199: Line 133:




python $EBROOTBIONANOSOLVE/Pipeline/08222018/pipelineCL.py -h
python ${EBROOTBIONANOSOLVE}/Pipeline/11162020/pipelineCL.py -h
 
usage: pipelineCL.py [-h] [-T T] [-j MAXTHREADS] [-je MAXTHREADSEXT]
usage: pipelineCL.py [-h] [-T T] [-j MAXTHREADS] [-jp MAXTHREADSPW] [-N N]
                    [-jp MAXTHREADSPW] [-J J] [-TJ TJ] [-Tp TP] [-Te TE]
                    [-G BED] [-i ITER] [-b BNX] [-l LOCAL] [-t TOOLS]
                    [-Tn TN] [-N N] [-G BED] [-i ITER] [-b BNX] [-l LOCAL]
                    [-B BYPASS] [-e EXP] [-r REF] [-x] [-c CLEANUP] [-C CXML]
                    [-t TOOLS] [-B BYPASS] [-e EXP] [-r REF] [-x]
                    [-w] [-a XML] [-p PERF] [-d] [-u] [-U [GROUPCONTIGS]]
                    [-c CLEANUP] [-C CXML] [-w] [-a XML] [-p PERF] [-d] [-u]
                    [-v [VERSION]] [-V RUNSV] [-A] [-y] [-m] [-H [H]]
                    [-U [GROUPCONTIGS]] [-v [VERSION]] [-V RUNSV] [-A] [-y]
                    [-f [F]] [-J J] [-z] [-E] [-W W] [-F F] [-R [R]]
                    [-m] [-f [F]] [-z] [-E] [-W W] [-Gsiz GSIZ] [-F F]
                    [-cd CONTROLDIR] [-pd PARAMDIR] [-op OUTLIERP]
                    [-R [R]] [-cd CONTROLDIR] [-pd PARAMDIR] [-op OUTLIERP]
                    [-cr CONTROL_BASELINE_FILE] [-cm CNV_MASK_FILE]
                    [-ce CHR_EXPECTED_CNS_FILE] [-json JSON] [-guided]
                    [-guidedB] [-seed SEED] [-finalmergeSV] [-cnvOnly]
                    [-NoCheckFiles] [-NoExtCharCheck] [--vapini VAP_INI]
                    [--cleanRestart] [--autoRestart]
                    [--dynamicExtension DYNAMICEXTENSION] [--docker]
                    [--experimental [EXPERIMENTAL [EXPERIMENTAL ...]]]
                    [--compute-confidence COMPUTE_CONFIDENCE]


Pipeline for de novo assembly - BioNano Genomics
Pipeline for de novo assembly - Bionano Genomics


optional arguments:
optional arguments:
   -h, --help         show this help message and exit
   -h, --help           show this help message and exit
   -T T               Available threads per Node [default 1]
   -T T                 Total threads per Node, with overloading [default 1]
   -j MAXTHREADS     Max Threads per job [default -T]
   -j MAXTHREADS         Max Threads per job [default -T arg]
   -jp MAXTHREADSPW   Max Threads per pairwise or stage0 job [default -T arg]
  -je MAXTHREADSEXT    Max Threads per extension stage1 job (if less than -j
   -N N               Minimum number of split bnx files (actual number is
                        value) [default 60]
                    multiple of this) (optional, default 2)
   -jp MAXTHREADSPW     Max Threads per pairwise or stage0 job [default -T
   -G BED             Bed file for gaps, used in structural variation (SV)
                        arg]
                    detection to check for SV overlap with reference gaps
  -J J                  Threads per large memory host jobs (mediumHostJob in
   -i ITER           Number of extension and merge iterations (default=1, must
                        clusterArguments.xml) for grouped jobs (has no effect
                    be in range [0,20], use 0 to skip)
                        without -f) [default 48]
   -b BNX             Input molecule (.bnx) file, required
  -TJ TJ                Total threads per Node, with overloading, for large
   -l LOCAL           Location of output files root directory, required, will
                        memory hosts (see -J) [default 2x -J value]
                    be created if does not exist; if does exist, will
  -Tp TP                Total threads per Node, with overloading, for pairwise
                    overwrite contents (may be error-prone)
                        jobs [default 2x -jp value]
   -t TOOLS           Location of executable files (RefAligner and Assembler,
  -Te TE                Total threads per Node, with overloading, for
                    required)
                        extension jobs [default -T value]
   -B BYPASS         Skip steps, using previous result. <= 0:None,
  -Tn TN                Nominal threads per Node, without overloading (non-
                    1:ImgDetect, 2:NoiseChar/Subsample, 3:Pairwise,
                        zero value will override -T -Tp -Te -TJ) [default 0]
                    4:Assembly, 5:RefineA, 6:RefineB, 7:merge0,
   -N N                 Minimum number of split bnx files (actual number is
                    8+(i-1)*2:Ext(i), 9+(i-1)*2:Mrg(i), N+1:alignmol
                        multiple of this). Value of 6 required (reserved) for
   -e EXP             Output file prefix (optional, default = exp)
                        Xeon-Phi hardware (optional, default 2)
   -r REF             Reference file (must be .cmap), to compare resulting
   -G BED               Bed file for gaps, used in structural variation (SV)
                    contigs (optional)
                        detection to check for SV overlap with reference gaps
   -x                 Exit after auto noise (noise characterization), do not
   -i ITER               Number of extension and merge iterations (default=1,
                    preform de novo assembly
                        must be in range [0,20], use 0 to skip)
   -c CLEANUP         Remove contig results (0 - keep all (default), 1 - remove
   -b BNX               Input molecule (.bnx) file, required
                    intermediate files, 2 - store in sqlite, 3 - store in
   -l LOCAL             Location of output files root directory, required,
                    sqlite and remove)
                        will be created if does not exist; if does exist, will
   -C CXML           Run on cluster, read XML file for submission arguments
                        overwrite contents (may be error-prone)
                    (optional--will not use cluster submission if absent)
   -t TOOLS             Location of executable files (RefAligner and
   -w                 Wipe clean previous contig results
                        Assembler, required)
   -a XML             Read XML file for parameters (required)
   -B BYPASS             Skip steps, using previous result. <= 0:None,
   -p PERF           Log performance in pipelineReport 0=None, 1=time, 2=perf,
                        1:ImgDetect, 2:NoiseChar/Subsample, 3:Pairwise,
                    3=time&perf (default=1)
                        4:Assembly, 5:RefineA, 6:RefineB, 7:merge0,
   -d                 Retired option (contig subdirectories), always enabled.
                        8+(i-1)*2:Ext(i), 9+(i-1)*2:Mrg(i), N+1:alignmol
   -u                 Do not perform final refinement (not recommended).
   -e EXP               Output file prefix (optional, default = exp)
   -U [GROUPCONTIGS] Group contigs in refinement and extension stages: always
   -r REF               Reference file (must be .cmap), to compare resulting
                    ON, this argument has no effect (retained for backward
                        contigs (optional)
                    compatibility)
   -x                   Exit after auto noise (noise characterization), do not
   -v [VERSION]       Print version; exit if argument > 1 supplied.
                        preform de novo assembly
   -V RUNSV           Detect structural variations. Default: run after final
   -c CLEANUP           Remove contig results (0 - keep all (default), 1 -
                    stage (normally refineFinal); if argument 0, disable.
                        remove intermediate files, 2 - store in sqlite, 3 -
   -A                 Align molcules to final contigs (ON by default, use this
                        store in sqlite and remove)
                    to turn off).
   -C CXML               Run on cluster, read XML file for submission arguments
   -y                 Automatically determine noise parameters (requires
                        (optional--will not use cluster submission if absent)
                    reference; optional, default off)
   -w                   Wipe clean previous contig results
   -m                 Disable molecule vs reference alignments (default on with
   -a XML               Read XML file for parameters (required)
                    reference)
   -p PERF               Log performance in pipelineReport 0=None, 1=time,
  -H [H]            Use HG19 (human genome) as reference, loaded from
                        2=perf, 3=time&perf (default=1)
                    Analysis/SV/CopyNumberProfiles/DATA. Overrides -r
   -d                   Retired option (contig subdirectories), always
                    argument. Use HG38 if argument 2 is supplied. [Default
                        enabled.
                    OFF]
   -u                   Do not perform final refinement (not recommended).
   -f [F]             Run this fraction of grouped jobs on host (0.2 if no arg)
   -U [GROUPCONTIGS]     Group contigs in refinement and extension stages:
                    [default 0]
                        always ON, this argument has no effect (retained for
  -J J              Number of threads on host for grouped jobs (has no effect
                        backward compatibility)
                    without -f) [default 48]
   -v [VERSION]         Print version; exit if argument > 1 supplied.
   -z                 Zip pipeline results (default ON, use this to turn off).
   -V RUNSV             Detect structural variations. Default: run after final
   -E                 ReCheck stdout completeness for completed jobs (default
                        stage (normally refineFinal); if argument 0, disable.
                    ON, use this to turn off).
   -A                   Align molcules to final contigs (ON by default, use
   -W W               Multiply group sizes by this factor to reduce number of
                        this to turn off).
                    jobs (for Genomes larger than Human). Use value under 1
   -y                   Automatically determine noise parameters (requires
                    to increase number of jobs
                        reference; optional, default off)
   -F F               Color channel: replace -usecolor X in optArgs with this,
   -m                   Disable molecule vs reference alignments (default on
                    must be either 1 or 2 [default OFF]
                        with reference)
   -R [R]             Rough assembly: denovo assembly used as autoNoise
   -f [F]               Run this fraction of grouped jobs on host (0.2 if no
                    reference for re-assembly; sequence may be optionally
                        arg) [default 0]
                    specified as -r, if supplied, will be used for global
   -z                   Zip pipeline results (default ON, use this to turn
                    scaling and SV calls, but not for autoNoise (optional,
                        off).
                    default off, optional argument 0.2-0.9 for fraction of
   -E                   ReCheck stdout completeness for completed jobs
                    rough assembly which must align to sequence for
                        (default ON, use this to turn off).
                    rescaling)
   -W W                 Multiply group sizes and BNX split sizes by this
   -cd CONTROLDIR     Control data directory for copy number profiles
                        factor to reduce number of jobs (for Genomes larger
                    [optional]
                        than Human). Use value under 1 to increase number of
   -pd PARAMDIR       Parameters directory for copy number profiles [optional]
                        jobs
   -op OUTLIERP       Outlier Probability for copy number profile (optional)
  -Gsiz GSIZ            Estimated Genome size in Gb
   -F F                 Color channel: replace -usecolor X in optArgs with
                        this, must be either 1 or 2 [default OFF]
   -R [R]               Rough assembly: denovo assembly used as autoNoise
                        reference for re-assembly; sequence may be optionally
                        specified as -r, if supplied, will be used for global
                        scaling and SV calls, but not for autoNoise (optional,
                        default off, optional argument 0.2-0.9 for fraction of
                        rough assembly which must align to sequence for
                        rescaling)
   -cd CONTROLDIR       Control data directory for copy number profiles
                        [optional]
   -pd PARAMDIR         Parameters directory for copy number profiles
                        [optional]
   -op OUTLIERP         Outlier Probability for copy number profile (optional)
  -cr CONTROL_BASELINE_FILE
                        Control CNV baseline reference file for copy number
                        profiles [optional]
  -cm CNV_MASK_FILE    CNV mask file for copy number profiles [optional]
  -ce CHR_EXPECTED_CNS_FILE
                        Expected copy numbers file for copy number profiles
                        [optional]
  -json JSON            json string that contains different, informative
                        parameters
  -guided              Guided assembly: requires reference or -seed and skips
                        pairwise,Assembly,refineA
  -guidedB              Guided assembly: requires reference or -seed and skips
                        pairwise,Assembly,refineA,refineB,mrg0
  -seed SEED            Seed Genome for Guided assembly (must be .cmap)
  -finalmergeSV        Detect SVs after final merge stage
  -cnvOnly              Exit after auto noise, alignmolvref, and CNV analysis;
                        do not preform de novo assembly
  -NoCheckFiles        Disable checking presence of all files mentioned in
                        stdout files
  -NoExtCharCheck      Disable checking contig sizes after extension
                        characterize stage
  --vapini VAP_INI      Variant annotation INI file
  --cleanRestart        Remove existing ouput from current stage before rerun
  --autoRestart        Retart pipeline from the stage where it left off
  --dynamicExtension DYNAMICEXTENSION
                        Automatically determine the optimal number of
                        iterations of extension and merge. Value specifies max
                        number of iterations [default 0 (disable)]
  --docker              Run jobs in docker container
  --experimental [EXPERIMENTAL [EXPERIMENTAL ...]]
                        Run experimental features. Multiple values (separated
                        by spaces) are possible. No experimental features in
                        v3.6
  --compute-confidence COMPUTE_CONFIDENCE
                        Compute new confidence scores (v3.6 and later).
                        Possible values: human_hg38/human_hg19/non_human.
                        Default is to keep old scores
</pre>
</pre>



Latest revision as of 21:40, 15 September 2021


Category

Bioinformatics

Program On

Sapelo2

Version

3.6.1-11162020

Author / Distributor

Details are at Bionano Solve

Description

From Bionano Solve: "Bionano Solve™ is an analysis pipeline for Bionano data processing, optimized for Bionano Compute and IrysSolve Compute Servers. A de novo assembly of a human genome can be completed in about 28 hours. Bionano Tools contains various tools and scripts, including the Bionano Solve analysis pipeline. These tools together perform computation jobs on Saphyr and IrysSolve Compute Servers."

Running Program

Also refer to Running Jobs on Sapelo2

For more information on Environment Modules on Sapelo2 please see the Lmod page.

  • Version 3.6.1-11162020, installed in /apps/eb/BionanoSolve/3.6.1-11162020-foss-2019b

ml To use BionanoSolve v3.6.1-11162020 pipelines, please first load the module with

module load BionanoSolve/3.6.1-11162020-foss-2019b

Once you loaded the module, an environmental variable called EBROOTBIONANOSOLVE is exported. It stores BionanoSolve installation path on the cluster, i.e., /apps/eb/BionanoSolve/3.6.1-11162020-foss-2019b/ . Using EBROOTBIONANOSOLVE, BionanoSolve components can be easily found, for example:


Pipeline is at ${EBROOTBIONANOSOLVE}/Pipeline/11162020

HybridScaffold is at ${EBROOTBIONANOSOLVE}/HybridScaffold/11162020

RefAligner is at ${EBROOTBIONANOSOLVE}/RefAligner/11442.11643rel

VariantAnnotation is at ${EBROOTBIONANOSOLVE}/VariantAnnotation/11162020

FSHD is at ${EBROOTBIONANOSOLVE}/FSHD/11162020


Sample job submission script (sub.sh) to run hybridScaffold.pl from BionanoSolve v3.6.1-11162020 in a batch job:

#!/bin/bash
#SBATCH --job-name=job_hybridScaffold       
#SBATCH --partition=batch            
#SBATCH --ntasks=1                  	
#SBATCH --cpus-per-task=2        
#SBATCH --mem=10gb                    
#SBATCH --time=120:00:00           
#SBATCH --output=log.%j.out     
#SBATCH --error=log.%j.err          
#SBATCH --mail-user=username@uga.edu  
#SBATCH --mail-type=ALL   

cd $SLURM_SUBMIT_DIR

module load BionanoSolve/3.6.1-11162020-foss-2019b

perl ${EBROOTBIONANOSOLVE}/HybridScaffold/11162020/hybridScaffold.pl [options]

Sample job submission script (sub.sh) to run pipelineCL.py from BionanoSolve v3.6.1-11162020 in a batch job:

#!/bin/bash
#SBATCH --job-name=job_hybridScaffold      
#SBATCH --partition=batch
#SBATCH --nodes=1            
#SBATCH --ntasks=1                  	
#SBATCH --cpus-per-task=32        
#SBATCH --mem=40gb                    
#SBATCH --time=120:00:00           
#SBATCH --output=log.%j.out     
#SBATCH --error=log.%j.err          
#SBATCH --mail-user=username@uga.edu  
#SBATCH --mail-type=ALL   

cd $SLURM_SUBMIT_DIR

module load BionanoSolve/3.6.1-11162020-foss-2019b

python ${EBROOTBIONANOSOLVE}/Pipeline/11162020/pipelineCL.py -T 32 -j 32 [options]

where EBROOTBIONANOSOLVE is the environmental variable storing BionanoSolve installation path on the cluster; [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.

Please note: BionanoSolve needs to use Distributed Resource Management Application API (DRMAA http://www.drmaa.org/) and a properly configured "clusterArgument.xml" file to run a distributed parallel job on the cluster. Currently DRMAA is not available on Sapelo2 cluster; therefore, please run BionanoSolve pipelineCL.py on a single node. Please do not run the pipeline using "-C <cluster argument.xml file>" option. We are sorry for the inconvenience.


Example of job submission

sbatch sub.sh 

Documentation

Details are at BionanoSolve

A user manual for running BionanoSolve pipeline on command-line can be found at BionanoSolve guide

ml BionanoSolve/3.6.1-11162020-foss-2019b 
perl ${EBROOTBIONANOSOLVE}/HybridScaffold/11162020/hybridScaffold.pl -h
	
Usage: perl hybridScaffold.pl <-h> <-n ngs_file> <-b bng_cmap_file> <-c hybrid_config_xml> <-o output_folder> <-B conflict_filter_level> <-N conflict_filter_level> <-f> 
      <-m molecules_bnx> <-p de_novo_pipeline> <-q de_novo_xml> <-v> <-x> <-y> <-e noise_param><-z tar_zip_file><-S>
      -h    : This help message         
      -n    : Input NGS FASTA [required]
      -b    : Input BioNano CMAP  [required]
      -c    : Merge configuration file [required]
      -o    : Output folder [required]
      -r    : RefAligner program [required]
      -B    : conflict filter level: 1 no filter, 2 cut contig at conflict, 3 exclude conflicting contig [required if not using -M option]
      -N    : conflict filter level: 1 no filter, 2 cut contig at conflict, 3 exclude conflicting contig [required if not using -M option]
      -f    : Force output and overwrite any existing files
      -x    : Flag to generate molecules to hybrid scaffold alignment and molecules to genome map alignment [optional]
      -y    : Flag to generate chimeric quality score for the Input BioNano CMAP [optional]
      -m    : Input BioNano molecules BNX [optional; only required for either the -x or -y option]
      -p    : Input de novo assembly pipeline directory [optional; only required for -x option]
      -q    : Input de novo assembly pipeline optArguments XML script [optional; only required for -x option]
      -e    : Input de novo assembly noise parameter .errbin or .err file [optional; recommended for -y option but not required]
      -v    : Print pipeline version information
      -M    : Input a conflict resolution file indicating which NGS and BioNano conflicting contigs to be cut [optional] 
      -z    : Name of a zipped file to archive the essential output files [optional]
      -S    : Only run hybridScaffold up to before Merge steps [optional]
      -w    : Name of the status text file needed by IrysView [optional]
      -t    : Perform pre-pairmerge sequence to pre-pairmerge genome map alignment [optional]
      -u    : Sequence of enzyme recognition site (overrides what has been specified in config XML file, for IrysView only) [optional]


python ${EBROOTBIONANOSOLVE}/Pipeline/11162020/pipelineCL.py -h
usage: pipelineCL.py [-h] [-T T] [-j MAXTHREADS] [-je MAXTHREADSEXT]
                     [-jp MAXTHREADSPW] [-J J] [-TJ TJ] [-Tp TP] [-Te TE]
                     [-Tn TN] [-N N] [-G BED] [-i ITER] [-b BNX] [-l LOCAL]
                     [-t TOOLS] [-B BYPASS] [-e EXP] [-r REF] [-x]
                     [-c CLEANUP] [-C CXML] [-w] [-a XML] [-p PERF] [-d] [-u]
                     [-U [GROUPCONTIGS]] [-v [VERSION]] [-V RUNSV] [-A] [-y]
                     [-m] [-f [F]] [-z] [-E] [-W W] [-Gsiz GSIZ] [-F F]
                     [-R [R]] [-cd CONTROLDIR] [-pd PARAMDIR] [-op OUTLIERP]
                     [-cr CONTROL_BASELINE_FILE] [-cm CNV_MASK_FILE]
                     [-ce CHR_EXPECTED_CNS_FILE] [-json JSON] [-guided]
                     [-guidedB] [-seed SEED] [-finalmergeSV] [-cnvOnly]
                     [-NoCheckFiles] [-NoExtCharCheck] [--vapini VAP_INI]
                     [--cleanRestart] [--autoRestart]
                     [--dynamicExtension DYNAMICEXTENSION] [--docker]
                     [--experimental [EXPERIMENTAL [EXPERIMENTAL ...]]]
                     [--compute-confidence COMPUTE_CONFIDENCE]

Pipeline for de novo assembly - Bionano Genomics

optional arguments:
  -h, --help            show this help message and exit
  -T T                  Total threads per Node, with overloading [default 1]
  -j MAXTHREADS         Max Threads per job [default -T arg]
  -je MAXTHREADSEXT     Max Threads per extension stage1 job (if less than -j
                        value) [default 60]
  -jp MAXTHREADSPW      Max Threads per pairwise or stage0 job [default -T
                        arg]
  -J J                  Threads per large memory host jobs (mediumHostJob in
                        clusterArguments.xml) for grouped jobs (has no effect
                        without -f) [default 48]
  -TJ TJ                Total threads per Node, with overloading, for large
                        memory hosts (see -J) [default 2x -J value]
  -Tp TP                Total threads per Node, with overloading, for pairwise
                        jobs [default 2x -jp value]
  -Te TE                Total threads per Node, with overloading, for
                        extension jobs [default -T value]
  -Tn TN                Nominal threads per Node, without overloading (non-
                        zero value will override -T -Tp -Te -TJ) [default 0]
  -N N                  Minimum number of split bnx files (actual number is
                        multiple of this). Value of 6 required (reserved) for
                        Xeon-Phi hardware (optional, default 2)
  -G BED                Bed file for gaps, used in structural variation (SV)
                        detection to check for SV overlap with reference gaps
  -i ITER               Number of extension and merge iterations (default=1,
                        must be in range [0,20], use 0 to skip)
  -b BNX                Input molecule (.bnx) file, required
  -l LOCAL              Location of output files root directory, required,
                        will be created if does not exist; if does exist, will
                        overwrite contents (may be error-prone)
  -t TOOLS              Location of executable files (RefAligner and
                        Assembler, required)
  -B BYPASS             Skip steps, using previous result. <= 0:None,
                        1:ImgDetect, 2:NoiseChar/Subsample, 3:Pairwise,
                        4:Assembly, 5:RefineA, 6:RefineB, 7:merge0,
                        8+(i-1)*2:Ext(i), 9+(i-1)*2:Mrg(i), N+1:alignmol
  -e EXP                Output file prefix (optional, default = exp)
  -r REF                Reference file (must be .cmap), to compare resulting
                        contigs (optional)
  -x                    Exit after auto noise (noise characterization), do not
                        preform de novo assembly
  -c CLEANUP            Remove contig results (0 - keep all (default), 1 -
                        remove intermediate files, 2 - store in sqlite, 3 -
                        store in sqlite and remove)
  -C CXML               Run on cluster, read XML file for submission arguments
                        (optional--will not use cluster submission if absent)
  -w                    Wipe clean previous contig results
  -a XML                Read XML file for parameters (required)
  -p PERF               Log performance in pipelineReport 0=None, 1=time,
                        2=perf, 3=time&perf (default=1)
  -d                    Retired option (contig subdirectories), always
                        enabled.
  -u                    Do not perform final refinement (not recommended).
  -U [GROUPCONTIGS]     Group contigs in refinement and extension stages:
                        always ON, this argument has no effect (retained for
                        backward compatibility)
  -v [VERSION]          Print version; exit if argument > 1 supplied.
  -V RUNSV              Detect structural variations. Default: run after final
                        stage (normally refineFinal); if argument 0, disable.
  -A                    Align molcules to final contigs (ON by default, use
                        this to turn off).
  -y                    Automatically determine noise parameters (requires
                        reference; optional, default off)
  -m                    Disable molecule vs reference alignments (default on
                        with reference)
  -f [F]                Run this fraction of grouped jobs on host (0.2 if no
                        arg) [default 0]
  -z                    Zip pipeline results (default ON, use this to turn
                        off).
  -E                    ReCheck stdout completeness for completed jobs
                        (default ON, use this to turn off).
  -W W                  Multiply group sizes and BNX split sizes by this
                        factor to reduce number of jobs (for Genomes larger
                        than Human). Use value under 1 to increase number of
                        jobs
  -Gsiz GSIZ            Estimated Genome size in Gb
  -F F                  Color channel: replace -usecolor X in optArgs with
                        this, must be either 1 or 2 [default OFF]
  -R [R]                Rough assembly: denovo assembly used as autoNoise
                        reference for re-assembly; sequence may be optionally
                        specified as -r, if supplied, will be used for global
                        scaling and SV calls, but not for autoNoise (optional,
                        default off, optional argument 0.2-0.9 for fraction of
                        rough assembly which must align to sequence for
                        rescaling)
  -cd CONTROLDIR        Control data directory for copy number profiles
                        [optional]
  -pd PARAMDIR          Parameters directory for copy number profiles
                        [optional]
  -op OUTLIERP          Outlier Probability for copy number profile (optional)
  -cr CONTROL_BASELINE_FILE
                        Control CNV baseline reference file for copy number
                        profiles [optional]
  -cm CNV_MASK_FILE     CNV mask file for copy number profiles [optional]
  -ce CHR_EXPECTED_CNS_FILE
                        Expected copy numbers file for copy number profiles
                        [optional]
  -json JSON            json string that contains different, informative
                        parameters
  -guided               Guided assembly: requires reference or -seed and skips
                        pairwise,Assembly,refineA
  -guidedB              Guided assembly: requires reference or -seed and skips
                        pairwise,Assembly,refineA,refineB,mrg0
  -seed SEED            Seed Genome for Guided assembly (must be .cmap)
  -finalmergeSV         Detect SVs after final merge stage
  -cnvOnly              Exit after auto noise, alignmolvref, and CNV analysis;
                        do not preform de novo assembly
  -NoCheckFiles         Disable checking presence of all files mentioned in
                        stdout files
  -NoExtCharCheck       Disable checking contig sizes after extension
                        characterize stage
  --vapini VAP_INI      Variant annotation INI file
  --cleanRestart        Remove existing ouput from current stage before rerun
  --autoRestart         Retart pipeline from the stage where it left off
  --dynamicExtension DYNAMICEXTENSION
                        Automatically determine the optimal number of
                        iterations of extension and merge. Value specifies max
                        number of iterations [default 0 (disable)]
  --docker              Run jobs in docker container
  --experimental [EXPERIMENTAL [EXPERIMENTAL ...]]
                        Run experimental features. Multiple values (separated
                        by spaces) are possible. No experimental features in
                        v3.6
  --compute-confidence COMPUTE_CONFIDENCE
                        Compute new confidence scores (v3.6 and later).
                        Possible values: human_hg38/human_hg19/non_human.
                        Default is to keep old scores

Back to Top

Installation

source code from BionanoSolve download

System

64-bit Linux