BionanoSolve-Sapelo2

From Research Computing Center Wiki
Jump to navigation Jump to search


Category

Bioinformatics

Program On

Sapelo2

Version

3.6.1-11162020

Author / Distributor

Details are at Bionano Solve

Description

From Bionano Solve: "Bionano Solve™ is an analysis pipeline for Bionano data processing, optimized for Bionano Compute and IrysSolve Compute Servers. A de novo assembly of a human genome can be completed in about 28 hours. Bionano Tools contains various tools and scripts, including the Bionano Solve analysis pipeline. These tools together perform computation jobs on Saphyr and IrysSolve Compute Servers."

Running Program

Also refer to Running Jobs on Sapelo2

For more information on Environment Modules on Sapelo2 please see the Lmod page.

  • Version 3.6.1-11162020, installed in /apps/eb/BionanoSolve/3.6.1-11162020-foss-2019b

ml To use BionanoSolve v3.6.1-11162020 pipelines, please first load the module with

module load BionanoSolve/3.6.1-11162020-foss-2019b

Once you loaded the module, an environmental variable called EBROOTBIONANOSOLVE is exported. It stores BionanoSolve installation path on the cluster, i.e., /apps/eb/BionanoSolve/3.6.1-11162020-foss-2019b/ . Using EBROOTBIONANOSOLVE, BionanoSolve components can be easily found, for example:


Pipeline is at ${EBROOTBIONANOSOLVE}/Pipeline/11162020

HybridScaffold is at ${EBROOTBIONANOSOLVE}/HybridScaffold/11162020

RefAligner is at ${EBROOTBIONANOSOLVE}/RefAligner/11442.11643rel

VariantAnnotation is at ${EBROOTBIONANOSOLVE}/VariantAnnotation/11162020

FSHD is at ${EBROOTBIONANOSOLVE}/FSHD/11162020


Sample job submission script (sub.sh) to run hybridScaffold.pl from BionanoSolve v3.6.1-11162020 in a batch job:

#!/bin/bash
#SBATCH --job-name=job_hybridScaffold       
#SBATCH --partition=batch            
#SBATCH --ntasks=1                  	
#SBATCH --cpus-per-task=2        
#SBATCH --mem=10gb                    
#SBATCH --time=120:00:00           
#SBATCH --output=log.%j.out     
#SBATCH --error=log.%j.err          
#SBATCH --mail-user=username@uga.edu  
#SBATCH --mail-type=ALL   

cd $SLURM_SUBMIT_DIR

module load BionanoSolve/3.6.1-11162020-foss-2019b

perl ${EBROOTBIONANOSOLVE}/HybridScaffold/11162020/hybridScaffold.pl [options]

Sample job submission script (sub.sh) to run pipelineCL.py from BionanoSolve v3.6.1-11162020 in a batch job:

#!/bin/bash
#SBATCH --job-name=job_hybridScaffold      
#SBATCH --partition=batch
#SBATCH --nodes=1            
#SBATCH --ntasks=1                  	
#SBATCH --cpus-per-task=32        
#SBATCH --mem=40gb                    
#SBATCH --time=120:00:00           
#SBATCH --output=log.%j.out     
#SBATCH --error=log.%j.err          
#SBATCH --mail-user=username@uga.edu  
#SBATCH --mail-type=ALL   

cd $SLURM_SUBMIT_DIR

module load BionanoSolve/3.6.1-11162020-foss-2019b

python ${EBROOTBIONANOSOLVE}/Pipeline/11162020/pipelineCL.py -T 32 -j 32 [options]

where EBROOTBIONANOSOLVE is the environmental variable storing BionanoSolve installation path on the cluster; [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.

Please note: BionanoSolve needs to use Distributed Resource Management Application API (DRMAA http://www.drmaa.org/) and a properly configured "clusterArgument.xml" file to run a distributed parallel job on the cluster. Currently DRMAA is not available on Sapelo2 cluster; therefore, please run BionanoSolve pipelineCL.py on a single node. Please do not run the pipeline using "-C <cluster argument.xml file>" option. We are sorry for the inconvenience.


Example of job submission

sbatch sub.sh 

Documentation

Details are at BionanoSolve

A user manual for running BionanoSolve pipeline on command-line can be found at BionanoSolve guide

ml BionanoSolve/3.6.1-11162020-foss-2019b 
perl ${EBROOTBIONANOSOLVE}/HybridScaffold/11162020/hybridScaffold.pl -h
	
Usage: perl hybridScaffold.pl <-h> <-n ngs_file> <-b bng_cmap_file> <-c hybrid_config_xml> <-o output_folder> <-B conflict_filter_level> <-N conflict_filter_level> <-f> 
      <-m molecules_bnx> <-p de_novo_pipeline> <-q de_novo_xml> <-v> <-x> <-y> <-e noise_param><-z tar_zip_file><-S>
      -h    : This help message         
      -n    : Input NGS FASTA [required]
      -b    : Input BioNano CMAP  [required]
      -c    : Merge configuration file [required]
      -o    : Output folder [required]
      -r    : RefAligner program [required]
      -B    : conflict filter level: 1 no filter, 2 cut contig at conflict, 3 exclude conflicting contig [required if not using -M option]
      -N    : conflict filter level: 1 no filter, 2 cut contig at conflict, 3 exclude conflicting contig [required if not using -M option]
      -f    : Force output and overwrite any existing files
      -x    : Flag to generate molecules to hybrid scaffold alignment and molecules to genome map alignment [optional]
      -y    : Flag to generate chimeric quality score for the Input BioNano CMAP [optional]
      -m    : Input BioNano molecules BNX [optional; only required for either the -x or -y option]
      -p    : Input de novo assembly pipeline directory [optional; only required for -x option]
      -q    : Input de novo assembly pipeline optArguments XML script [optional; only required for -x option]
      -e    : Input de novo assembly noise parameter .errbin or .err file [optional; recommended for -y option but not required]
      -v    : Print pipeline version information
      -M    : Input a conflict resolution file indicating which NGS and BioNano conflicting contigs to be cut [optional] 
      -z    : Name of a zipped file to archive the essential output files [optional]
      -S    : Only run hybridScaffold up to before Merge steps [optional]
      -w    : Name of the status text file needed by IrysView [optional]
      -t    : Perform pre-pairmerge sequence to pre-pairmerge genome map alignment [optional]
      -u    : Sequence of enzyme recognition site (overrides what has been specified in config XML file, for IrysView only) [optional]


python ${EBROOTBIONANOSOLVE}/Pipeline/11162020/pipelineCL.py -h
usage: pipelineCL.py [-h] [-T T] [-j MAXTHREADS] [-je MAXTHREADSEXT]
                     [-jp MAXTHREADSPW] [-J J] [-TJ TJ] [-Tp TP] [-Te TE]
                     [-Tn TN] [-N N] [-G BED] [-i ITER] [-b BNX] [-l LOCAL]
                     [-t TOOLS] [-B BYPASS] [-e EXP] [-r REF] [-x]
                     [-c CLEANUP] [-C CXML] [-w] [-a XML] [-p PERF] [-d] [-u]
                     [-U [GROUPCONTIGS]] [-v [VERSION]] [-V RUNSV] [-A] [-y]
                     [-m] [-f [F]] [-z] [-E] [-W W] [-Gsiz GSIZ] [-F F]
                     [-R [R]] [-cd CONTROLDIR] [-pd PARAMDIR] [-op OUTLIERP]
                     [-cr CONTROL_BASELINE_FILE] [-cm CNV_MASK_FILE]
                     [-ce CHR_EXPECTED_CNS_FILE] [-json JSON] [-guided]
                     [-guidedB] [-seed SEED] [-finalmergeSV] [-cnvOnly]
                     [-NoCheckFiles] [-NoExtCharCheck] [--vapini VAP_INI]
                     [--cleanRestart] [--autoRestart]
                     [--dynamicExtension DYNAMICEXTENSION] [--docker]
                     [--experimental [EXPERIMENTAL [EXPERIMENTAL ...]]]
                     [--compute-confidence COMPUTE_CONFIDENCE]

Pipeline for de novo assembly - Bionano Genomics

optional arguments:
  -h, --help            show this help message and exit
  -T T                  Total threads per Node, with overloading [default 1]
  -j MAXTHREADS         Max Threads per job [default -T arg]
  -je MAXTHREADSEXT     Max Threads per extension stage1 job (if less than -j
                        value) [default 60]
  -jp MAXTHREADSPW      Max Threads per pairwise or stage0 job [default -T
                        arg]
  -J J                  Threads per large memory host jobs (mediumHostJob in
                        clusterArguments.xml) for grouped jobs (has no effect
                        without -f) [default 48]
  -TJ TJ                Total threads per Node, with overloading, for large
                        memory hosts (see -J) [default 2x -J value]
  -Tp TP                Total threads per Node, with overloading, for pairwise
                        jobs [default 2x -jp value]
  -Te TE                Total threads per Node, with overloading, for
                        extension jobs [default -T value]
  -Tn TN                Nominal threads per Node, without overloading (non-
                        zero value will override -T -Tp -Te -TJ) [default 0]
  -N N                  Minimum number of split bnx files (actual number is
                        multiple of this). Value of 6 required (reserved) for
                        Xeon-Phi hardware (optional, default 2)
  -G BED                Bed file for gaps, used in structural variation (SV)
                        detection to check for SV overlap with reference gaps
  -i ITER               Number of extension and merge iterations (default=1,
                        must be in range [0,20], use 0 to skip)
  -b BNX                Input molecule (.bnx) file, required
  -l LOCAL              Location of output files root directory, required,
                        will be created if does not exist; if does exist, will
                        overwrite contents (may be error-prone)
  -t TOOLS              Location of executable files (RefAligner and
                        Assembler, required)
  -B BYPASS             Skip steps, using previous result. <= 0:None,
                        1:ImgDetect, 2:NoiseChar/Subsample, 3:Pairwise,
                        4:Assembly, 5:RefineA, 6:RefineB, 7:merge0,
                        8+(i-1)*2:Ext(i), 9+(i-1)*2:Mrg(i), N+1:alignmol
  -e EXP                Output file prefix (optional, default = exp)
  -r REF                Reference file (must be .cmap), to compare resulting
                        contigs (optional)
  -x                    Exit after auto noise (noise characterization), do not
                        preform de novo assembly
  -c CLEANUP            Remove contig results (0 - keep all (default), 1 -
                        remove intermediate files, 2 - store in sqlite, 3 -
                        store in sqlite and remove)
  -C CXML               Run on cluster, read XML file for submission arguments
                        (optional--will not use cluster submission if absent)
  -w                    Wipe clean previous contig results
  -a XML                Read XML file for parameters (required)
  -p PERF               Log performance in pipelineReport 0=None, 1=time,
                        2=perf, 3=time&perf (default=1)
  -d                    Retired option (contig subdirectories), always
                        enabled.
  -u                    Do not perform final refinement (not recommended).
  -U [GROUPCONTIGS]     Group contigs in refinement and extension stages:
                        always ON, this argument has no effect (retained for
                        backward compatibility)
  -v [VERSION]          Print version; exit if argument > 1 supplied.
  -V RUNSV              Detect structural variations. Default: run after final
                        stage (normally refineFinal); if argument 0, disable.
  -A                    Align molcules to final contigs (ON by default, use
                        this to turn off).
  -y                    Automatically determine noise parameters (requires
                        reference; optional, default off)
  -m                    Disable molecule vs reference alignments (default on
                        with reference)
  -f [F]                Run this fraction of grouped jobs on host (0.2 if no
                        arg) [default 0]
  -z                    Zip pipeline results (default ON, use this to turn
                        off).
  -E                    ReCheck stdout completeness for completed jobs
                        (default ON, use this to turn off).
  -W W                  Multiply group sizes and BNX split sizes by this
                        factor to reduce number of jobs (for Genomes larger
                        than Human). Use value under 1 to increase number of
                        jobs
  -Gsiz GSIZ            Estimated Genome size in Gb
  -F F                  Color channel: replace -usecolor X in optArgs with
                        this, must be either 1 or 2 [default OFF]
  -R [R]                Rough assembly: denovo assembly used as autoNoise
                        reference for re-assembly; sequence may be optionally
                        specified as -r, if supplied, will be used for global
                        scaling and SV calls, but not for autoNoise (optional,
                        default off, optional argument 0.2-0.9 for fraction of
                        rough assembly which must align to sequence for
                        rescaling)
  -cd CONTROLDIR        Control data directory for copy number profiles
                        [optional]
  -pd PARAMDIR          Parameters directory for copy number profiles
                        [optional]
  -op OUTLIERP          Outlier Probability for copy number profile (optional)
  -cr CONTROL_BASELINE_FILE
                        Control CNV baseline reference file for copy number
                        profiles [optional]
  -cm CNV_MASK_FILE     CNV mask file for copy number profiles [optional]
  -ce CHR_EXPECTED_CNS_FILE
                        Expected copy numbers file for copy number profiles
                        [optional]
  -json JSON            json string that contains different, informative
                        parameters
  -guided               Guided assembly: requires reference or -seed and skips
                        pairwise,Assembly,refineA
  -guidedB              Guided assembly: requires reference or -seed and skips
                        pairwise,Assembly,refineA,refineB,mrg0
  -seed SEED            Seed Genome for Guided assembly (must be .cmap)
  -finalmergeSV         Detect SVs after final merge stage
  -cnvOnly              Exit after auto noise, alignmolvref, and CNV analysis;
                        do not preform de novo assembly
  -NoCheckFiles         Disable checking presence of all files mentioned in
                        stdout files
  -NoExtCharCheck       Disable checking contig sizes after extension
                        characterize stage
  --vapini VAP_INI      Variant annotation INI file
  --cleanRestart        Remove existing ouput from current stage before rerun
  --autoRestart         Retart pipeline from the stage where it left off
  --dynamicExtension DYNAMICEXTENSION
                        Automatically determine the optimal number of
                        iterations of extension and merge. Value specifies max
                        number of iterations [default 0 (disable)]
  --docker              Run jobs in docker container
  --experimental [EXPERIMENTAL [EXPERIMENTAL ...]]
                        Run experimental features. Multiple values (separated
                        by spaces) are possible. No experimental features in
                        v3.6
  --compute-confidence COMPUTE_CONFIDENCE
                        Compute new confidence scores (v3.6 and later).
                        Possible values: human_hg38/human_hg19/non_human.
                        Default is to keep old scores

Back to Top

Installation

source code from BionanoSolve download

System

64-bit Linux