BionanoSolve-Sapelo2

From Research Computing Center Wiki
Revision as of 10:14, 14 April 2021 by Moses (talk | contribs) (Created page with "Category:Sapelo2oldCategory:SoftwareCategory:Bioinformatics === Category === Bioinformatics === Program On === Sapelo2 === Version === 3.2.1-04122018, 3.2.2-0...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Category

Bioinformatics

Program On

Sapelo2

Version

3.2.1-04122018, 3.2.2-08222018, 3.3-10252018, 3.4-06042019

Author / Distributor

Details are at Bionano Solve

Description

From Bionano Solve: "Bionano Solve™ is an analysis pipeline for Bionano data processing, optimized for Bionano Compute and IrysSolve Compute Servers. A de novo assembly of a human genome can be completed in about 28 hours. Bionano Tools contains various tools and scripts, including the Bionano Solve analysis pipeline. These tools together perform computation jobs on Saphyr and IrysSolve Compute Servers."

Running Program

Also refer to Running Jobs on Sapelo2

For more information on Environment Modules on Sapelo2 please see the Lmod page.

  • Version 3.2.1-04122018, installed in /usr/local/apps/eb/BionanoSolve/3.2.1-04122018-foss-2016b
  • Version 3.2.2-08222018, installed in /usr/local/apps/eb/BionanoSolve/3.2.2-08222018-foss-2016b
  • Version 3.3-10252018, installed in /usr/local/apps/eb/BionanoSolve/3.3-10252018-foss-2016b
  • Version 3.4-06042019, installed in /usr/local/apps/eb/BionanoSolve/3.4-06042019-foss-2016b

To use BionanoSolve/3.2.1-04122018 pipelines, please first load the module with

module load BionanoSolve/3.2.1-04122018-foss-2016b

To use BionanoSolve/3.2.2-08222018 pipelines, please first load the module with

module load BionanoSolve/3.2.2-08222018-foss-2016b

To use BionanoSolve/3.3-10252018 pipelines, please first load the module with

module load BionanoSolve/3.3-10252018-foss-2016b

To use BionanoSolve/3.4-06042019 pipelines, please first load the module with

module load BionanoSolve/3.4-06042019-foss-2016b

Once you loaded the module, an environmental variable called EBROOTBIONANOSOLVE is created for storing BionanoSolve installation path on cluster (i.e. /usr/local/apps/eb/BionanoSolve/3.2.1-04122018-foss-2016b for version 3.2.1-04122018; /usr/local/apps/eb/BionanoSolve/3.2.2-08222018-foss-2016b for version 3.2.2-08222018; /usr/local/apps/eb/BionanoSolve/3.3-10252018-foss-2016b for version 3.2.2-08222018). Using EBROOTBIONANOSOLVE, BionanoSolve components can be easily found, for example:

  • Version 3.2.1-04122018:

Pipeline is put in $EBROOTBIONANOSOLVE/Pipeline/04122018

HybridScaffold is put in $EBROOTBIONANOSOLVE/HybridScaffold/04122018

RefAligner is put in $EBROOTBIONANOSOLVE/RefAligner/7437.7523rel

VariantAnnotation is put in $EBROOTBIONANOSOLVE/VariantAnnotation/04122018

  • Version 3.2.2-08222018:

Pipeline is put in $EBROOTBIONANOSOLVE/Pipeline/08222018

HybridScaffold is put in $EBROOTBIONANOSOLVE/HybridScaffold/08222018

RefAligner is put in $EBROOTBIONANOSOLVE/RefAligner/7782.7865rel

VariantAnnotation is put in $EBROOTBIONANOSOLVE/VariantAnnotation/08222018

  • Version 3.3-10252018:

Pipeline is put in $EBROOTBIONANOSOLVE/Pipeline/10252018

HybridScaffold is put in $EBROOTBIONANOSOLVE/HybridScaffold/10252018

RefAligner is put in $EBROOTBIONANOSOLVE/RefAligner/7915.7989rel

VariantAnnotation is put in $EBROOTBIONANOSOLVE/VariantAnnotation/10252018

  • Version 3.4-06042019:

Pipeline is put in $EBROOTBIONANOSOLVE/Pipeline/06042019

HybridScaffold is put in $EBROOTBIONANOSOLVE/HybridScaffold/06042019

RefAligner is put in $EBROOTBIONANOSOLVE/RefAligner/8949.9232rel

VariantAnnotation is put in $EBROOTBIONANOSOLVE/VariantAnnotation/06042019


Sample job submission script (sub.sh) to run hybridScaffold.pl from BionanoSolve/3.2.1-04122018 in a batch job:

#PBS -S /bin/bash
#PBS -q batch
#PBS -N job_hybridScaffold
#PBS -l nodes=1:ppn=2
#PBS -l walltime=12:00:00
#PBS -l mem=10g
#PBS -j oe

cd $PBS_O_WORKDIR
module load BionanoSolve/3.2.1-04122018-foss-2016b
perl $EBROOTBIONANOSOLVE/HybridScaffold/04122018/hybridScaffold.pl [options]

Sample job submission script (sub.sh) to run pipelineCL.py from BionanoSolve/3.2.1-04122018 in a batch job:

#PBS -S /bin/bash
#PBS -q batch
#PBS -N job_hybridScaffold
#PBS -l nodes=1:ppn=48
#PBS -l walltime=12:00:00
#PBS -l mem=10g
#PBS -j oe

cd $PBS_O_WORKDIR
ml BionanoSolve/3.2.1-04122018-foss-2016b
python $EBROOTBIONANOSOLVE/Pipeline/04122018/pipelineCL.py -T 48 -j 48 [options]

Sample job submission script (sub.sh) to run hybridScaffold.pl from BionanoSolve/3.2.2-08222018 in a batch job:

#PBS -S /bin/bash
#PBS -q batch
#PBS -N job_hybridScaffold
#PBS -l nodes=1:ppn=2
#PBS -l walltime=12:00:00
#PBS -l mem=10g
#PBS -j oe

cd $PBS_O_WORKDIR
module load BionanoSolve/3.2.2-08222018-foss-2016b
perl $EBROOTBIONANOSOLVE/HybridScaffold/08222018/hybridScaffold.pl [options]

Sample job submission script (sub.sh) to run pipelineCL.py from BionanoSolve/3.2.2-08222018 in a batch job:

#PBS -S /bin/bash
#PBS -q batch
#PBS -N job_hybridScaffold
#PBS -l nodes=1:ppn=48
#PBS -l walltime=12:00:00
#PBS -l mem=10g
#PBS -j oe

cd $PBS_O_WORKDIR
module load BionanoSolve/3.2.2-08222018-foss-2016b
python $EBROOTBIONANOSOLVE/Pipeline/08222018/pipelineCL.py -T 48 -j 48 [options]

where EBROOTBIONANOSOLVE is the environmental variable storing BionanoSolve installation path on cluster; [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.

Please note: BionanoSolve needs to use Distributed Resource Management Application API (DRMAA http://www.drmaa.org/) and a properly configured "clusterArgument.xml" file to run a distributed parallel job on cluster. Currently DRMAA is not available on Sapelo2 cluster; therefore, please run BionanoSolve pipelineCL.py on a single node. Please do not run the pipeline using "-C <cluster argument.xml file>" option. We are sorry for the inconvenience.


Example of job submission

qsub sub.sh 

Documentation

Details are at BionanoSolve

A user manual for running BionanoSolve pipeline on command-line can be found at BionanoSolve guide

module load BionanoSolve/3.2.2-08222018-foss-2016b
perl $EBROOTBIONANOSOLVE/HybridScaffold/08222018/hybridScaffold.pl -h
	
Usage: perl hybridScaffold.pl <-h> <-n ngs_file> <-b bng_cmap_file> <-c hybrid_config_xml> <-o output_folder> <-B conflict_filter_level> <-N conflict_filter_level> <-f> 
      <-m molecules_bnx> <-p de_novo_pipeline> <-q de_novo_xml> <-v> <-x> <-y> <-e noise_param><-z tar_zip_file><-S>
      -h    : This help message         
      -n    : Input NGS FASTA [required]
      -b    : Input BioNano CMAP  [required]
      -c    : Merge configuration file [required]
      -o    : Output folder [required]
      -r    : RefAligner program [required]
      -B    : conflict filter level: 1 no filter, 2 cut contig at conflict, 3 exclude conflicting contig [required if not using -M option]
      -N    : conflict filter level: 1 no filter, 2 cut contig at conflict, 3 exclude conflicting contig [required if not using -M option]
      -f    : Force output and overwrite any existing files
      -x    : Flag to generate molecules to hybrid scaffold alignment and molecules to genome map alignment [optional]
      -y    : Flag to generate chimeric quality score for the Input BioNano CMAP [optional]
      -m    : Input BioNano molecules BNX [optional; only required for either the -x or -y option]
      -p    : Input de novo assembly pipeline directory [optional; only required for -x option]
      -q    : Input de novo assembly pipeline optArguments XML script [optional; only required for -x option]
      -e    : Input de novo assembly noise parameter .errbin or .err file [optional; recommended for -y option but not required]
      -v    : Print pipeline version information
      -M    : Input a conflict resolution file indicating which NGS and BioNano conflicting contigs to be cut [optional] 
      -z    : Name of a zipped file to archive the essential output files [optional]
      -S    : Only run hybridScaffold up to before Merge steps [optional]
      -w    : Name of the status text file needed by IrysView [optional]
      -t    : Perform pre-pairmerge sequence to pre-pairmerge genome map alignment [optional]
      -u    : Sequence of enzyme recognition site (overrides what has been specified in config XML file, for IrysView only) [optional]


python $EBROOTBIONANOSOLVE/Pipeline/08222018/pipelineCL.py -h

usage: pipelineCL.py [-h] [-T T] [-j MAXTHREADS] [-jp MAXTHREADSPW] [-N N]
                     [-G BED] [-i ITER] [-b BNX] [-l LOCAL] [-t TOOLS]
                     [-B BYPASS] [-e EXP] [-r REF] [-x] [-c CLEANUP] [-C CXML]
                     [-w] [-a XML] [-p PERF] [-d] [-u] [-U [GROUPCONTIGS]]
                     [-v [VERSION]] [-V RUNSV] [-A] [-y] [-m] [-H [H]]
                     [-f [F]] [-J J] [-z] [-E] [-W W] [-F F] [-R [R]]
                     [-cd CONTROLDIR] [-pd PARAMDIR] [-op OUTLIERP]

Pipeline for de novo assembly - BioNano Genomics

optional arguments:
  -h, --help         show this help message and exit
  -T T               Available threads per Node [default 1]
  -j MAXTHREADS      Max Threads per job [default -T]
  -jp MAXTHREADSPW   Max Threads per pairwise or stage0 job [default -T arg]
  -N N               Minimum number of split bnx files (actual number is
                     multiple of this) (optional, default 2)
  -G BED             Bed file for gaps, used in structural variation (SV)
                     detection to check for SV overlap with reference gaps
  -i ITER            Number of extension and merge iterations (default=1, must
                     be in range [0,20], use 0 to skip)
  -b BNX             Input molecule (.bnx) file, required
  -l LOCAL           Location of output files root directory, required, will
                     be created if does not exist; if does exist, will
                     overwrite contents (may be error-prone)
  -t TOOLS           Location of executable files (RefAligner and Assembler,
                     required)
  -B BYPASS          Skip steps, using previous result. <= 0:None,
                     1:ImgDetect, 2:NoiseChar/Subsample, 3:Pairwise,
                     4:Assembly, 5:RefineA, 6:RefineB, 7:merge0,
                     8+(i-1)*2:Ext(i), 9+(i-1)*2:Mrg(i), N+1:alignmol
  -e EXP             Output file prefix (optional, default = exp)
  -r REF             Reference file (must be .cmap), to compare resulting
                     contigs (optional)
  -x                 Exit after auto noise (noise characterization), do not
                     preform de novo assembly
  -c CLEANUP         Remove contig results (0 - keep all (default), 1 - remove
                     intermediate files, 2 - store in sqlite, 3 - store in
                     sqlite and remove)
  -C CXML            Run on cluster, read XML file for submission arguments
                     (optional--will not use cluster submission if absent)
  -w                 Wipe clean previous contig results
  -a XML             Read XML file for parameters (required)
  -p PERF            Log performance in pipelineReport 0=None, 1=time, 2=perf,
                     3=time&perf (default=1)
  -d                 Retired option (contig subdirectories), always enabled.
  -u                 Do not perform final refinement (not recommended).
  -U [GROUPCONTIGS]  Group contigs in refinement and extension stages: always
                     ON, this argument has no effect (retained for backward
                     compatibility)
  -v [VERSION]       Print version; exit if argument > 1 supplied.
  -V RUNSV           Detect structural variations. Default: run after final
                     stage (normally refineFinal); if argument 0, disable.
  -A                 Align molcules to final contigs (ON by default, use this
                     to turn off).
  -y                 Automatically determine noise parameters (requires
                     reference; optional, default off)
  -m                 Disable molecule vs reference alignments (default on with
                     reference)
  -H [H]             Use HG19 (human genome) as reference, loaded from
                     Analysis/SV/CopyNumberProfiles/DATA. Overrides -r
                     argument. Use HG38 if argument 2 is supplied. [Default
                     OFF]
  -f [F]             Run this fraction of grouped jobs on host (0.2 if no arg)
                     [default 0]
  -J J               Number of threads on host for grouped jobs (has no effect
                     without -f) [default 48]
  -z                 Zip pipeline results (default ON, use this to turn off).
  -E                 ReCheck stdout completeness for completed jobs (default
                     ON, use this to turn off).
  -W W               Multiply group sizes by this factor to reduce number of
                     jobs (for Genomes larger than Human). Use value under 1
                     to increase number of jobs
  -F F               Color channel: replace -usecolor X in optArgs with this,
                     must be either 1 or 2 [default OFF]
  -R [R]             Rough assembly: denovo assembly used as autoNoise
                     reference for re-assembly; sequence may be optionally
                     specified as -r, if supplied, will be used for global
                     scaling and SV calls, but not for autoNoise (optional,
                     default off, optional argument 0.2-0.9 for fraction of
                     rough assembly which must align to sequence for
                     rescaling)
  -cd CONTROLDIR     Control data directory for copy number profiles
                     [optional]
  -pd PARAMDIR       Parameters directory for copy number profiles [optional]
  -op OUTLIERP       Outlier Probability for copy number profile (optional)

Back to Top

Installation

source code from BionanoSolve download

System

64-bit Linux