Canu-Sapelo2: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
 
(5 intermediate revisions by 3 users not shown)
Line 9: Line 9:


=== Version ===
=== Version ===
2.1.1
2.2


=== Author / Distributor ===
=== Author / Distributor ===
Line 17: Line 17:
=== Description ===
=== Description ===
"Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II/Sequel or Oxford Nanopore MinION). "
"Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II/Sequel or Oxford Nanopore MinION). "
More details are at [http://www.repeatmasker.org Canu]
More details are at Canu's [https://canu.readthedocs.io/en/latest/index.html documentation].


=== Running Program ===
=== Running Program ===


'''Version 2.1.1'''
'''Version 2.2'''


To use this version, please load the module with
To use this version, please load the module with
<pre class="gscript">
<pre class="gscript">
ml canu/2.1.1-GCCcore-8.3.0-Java-11
ml canu/2.2-GCCcore-11.2.0
</pre>
</pre>
When you invoke canu, please use the gridOptions to pass queueing system options for the jobs the canu pipeline submits. At a minimum, please specify a partition, the number of tasks and the walltime. For example, use '''gridOptions =  --partition=batch --ntasks=1 --cpus-per-task=4 --time=168:00:00 '''. The --mem option will be added automatically by the pipeline scripts.
or with
<pre class="gscript">
ml canu/2.2-GCCcore-11.3.0
</pre>
When you invoke canu, please use the gridOptions to pass queueing system options for the jobs the canu pipeline submits. At a minimum, please specify a partition, the number of tasks and the walltime. For example, use '''gridOptions =  --partition=batch --ntasks=1 --cpus-per-task=4 --time=168:00:00 '''. The --mem-per-cpu option will be added automatically by the pipeline scripts, but you can also add it if the pipeline is not able to estimate the memory needed correctly.




Here is an example of a shell script, sub.sh, to run on the batch queue:  
 
Here is an example of a shell script, sub.sh, to run Canu on the batch queue:  
<pre class="gscript">
<pre class="gscript">
#!/bin/bash
#!/bin/bash
Line 36: Line 41:
#SBATCH --job-name=canujobname
#SBATCH --job-name=canujobname
#SBATCH --ntasks=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --time=1:00:00
#SBATCH --time=120:00:00
#SBATCH --mem=10G
#SBATCH --mem=40G


cd $SLURM_SUBMIT_DIR
cd $SLURM_SUBMIT_DIR


ml canu/2.1.1-GCCcore-8.3.0-Java-11
ml canu/2.2-GCCcore-11.2.0


canu gridOptions=" --partition=batch --ntasks=1 --cpus-per-task=4 --time=168:00:00 " [options]
canu gridOptions=" --partition=batch --ntasks=1 --cpus-per-task=4 --time=168:00:00 " [options]


</pre>
</pre>
where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.  
where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.  Please note that the Slurm headers (#SBATCH lines) are only for Canu's initial job.  The resource limits of all of the jobs that Canu spawns will be determined by what is defined in the gridOptions.  




Line 57: Line 61:


=== Documentation ===
=== Documentation ===
<pre class="gcommand">
<pre class="gcommand">
[shtsai@b1-24 ~]$ ml canu/2.1.1-GCCcore-8.3.0-Java-11
[cft07037@d2-13 canu]$ ml canu/2.2-GCCcore-11.2.0  
[shtsai@b1-24 ~]$ canu --help
[cft07037@d2-13 canu]$ canu --help


usage:  canu [-version] [-citation] \
usage:  canu [-version] [-citation] \
Line 136: Line 140:


Complete documentation at http://canu.readthedocs.org/en/latest/
Complete documentation at http://canu.readthedocs.org/en/latest/
</pre>
</pre>
   
   
<pre  class="gcommand">
[shtsai@b1-24 ~]$ ml canu/2.1.1-GCCcore-8.3.0-Java-11
[shtsai@b1-24 ~]$ canu -options
MMapBlockSize                          Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
MMapMerSize                            K-mer size for seeds in minmap
MhapBlockSize                          Number of reads per GB of memory allowed (mhapMemory)
MhapFilterThreshold                    Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
MhapFilterUnique                        Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
MhapMerSize                            K-mer size for seeds in mhap
MhapNoTf                                Expert option: True or false, do not use tf weighting, only idf of tf-idf.
MhapOptions                            Expert option: free-form parameters to pass to MHAP.
MhapOrderedMerSize                      K-mer size for second-stage filter in mhap
MhapSensitivity                        Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
MhapVersion                            Version of the MHAP jar file to use
Overlapper                              Which overlap algorithm to use for unitig construction
OvlFilter                              Filter overlaps based on expected kmers vs observed kmers
OvlFrequentMers                        Do not seed overlaps with these kmers
OvlHashBits                            Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per utgOvlHashBlockLength
OvlHashBlockLength                      Amount of sequence (bp) to load into the overlap hash table
OvlHashLoad                            Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
OvlMerDistinct                          K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
OvlMerSize                              K-mer size for seeds in overlaps
OvlMerThreshold                        K-mer frequency threshold; mers more frequent than this count are ignored
OvlRefBlockLength                      Amount of sequence (bp) to search against the hash table per batch
ReAlign                                Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses utgOvlErrorRate
batConcurrency                          Unused, only one process supported
batMemory                              Approximate maximum memory usage, in gigabytes, default is the maxMemory limit
batOptions                              Advanced options to bogart
batStageSpace                          Amount of local disk space needed to stage data for unitig construction jobs
batThreads                              Number of threads to use; default is the maxThreads limit
cnsConcurrency                          If grid not enabled, number of unitig consensus jobs to run at the same time; default is n_proc / n_threads
cnsConsensus                            Which consensus algorithm to use; 'pbdagcon' (fast, reliable); 'utgcns' (multialignment output); 'quick' (single read mosaic); default 'pbdagcon'
cnsErrorRate                            Consensus expects alignments at about this error rate
cnsMaxCoverage                          Limit unitig consensus to at most this coverage; default '40' = unlimited
cnsMemory                              Amount of memory, in gigabytes, to use for unitig consensus jobs
cnsPartitions                          Attempt to create this many consensus jobs; default '0' = based on the largest tig
cnsStageSpace                          Amount of local disk space needed to stage data for unitig consensus jobs
cnsThreads                              Number of threads to use for unitig consensus jobs
contigFilter                            Parameters to filter out 'unassembled' unitigs.  Five values: minReads minLength singleReadSpan lowCovFraction lowCovDepth
corConcurrency                          If grid not enabled, number of read correction jobs to run at the same time; default is n_proc / n_threads
corConsensus                            Which consensus algorithm to use; only 'falcon' is supported; default 'falcon'
corErrorRate                            Only use raw alignments below this error rate to construct corrected reads
corFilter                              Method to filter short reads from correction; 'quick' or 'expensive'; default 'expensive'
corMMapBlockSize                        Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
corMMapMerSize                          K-mer size for seeds in minmap
corMaxEvidenceCoverageGlobal            Limit reads used for correction to supporting at most this coverage; default: '1.0x' = 1.0 * estimated coverage
corMaxEvidenceCoverageLocal            Limit reads being corrected to at most this much evidence coverage; default: '2.0x' = 2.0 * estimated coverage
corMaxEvidenceErate                    Limit read correction to only overlaps at or below this fraction error; default: unlimited
corMemory                              Amount of memory, in gigabytes, to use for read correction jobs
corMhapBlockSize                        Number of reads per GB of memory allowed (mhapMemory)
corMhapFilterThreshold                  Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
corMhapFilterUnique                    Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
corMhapMerSize                          K-mer size for seeds in mhap
corMhapNoTf                            Expert option: True or false, do not use tf weighting, only idf of tf-idf.
corMhapOptions                          Expert option: free-form parameters to pass to MHAP.
corMhapOrderedMerSize                  K-mer size for second-stage filter in mhap
corMhapSensitivity                      Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
corMhapVersion                          Version of the MHAP jar file to use
corMinCoverage                          Minimum number of bases supporting each corrected base, if less than this sequences are split; default based on input read coverage: 0 <= 30x < 4 < 60x <= 4
corMinEvidenceLength                    Limit read correction to only overlaps longer than this; default: unlimited
corOutCoverage                          Only correct the longest reads up to this coverage; default 40
corOverlapper                          Which overlap algorithm to use for correction
corOvlErrorRate                        Overlaps above this error rate are not computed
corOvlFilter                            Filter overlaps based on expected kmers vs observed kmers
corOvlFrequentMers                      Do not seed overlaps with these kmers
corOvlHashBits                          Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per corOvlHashBlockLength
corOvlHashBlockLength                  Amount of sequence (bp) to load into the overlap hash table
corOvlHashLoad                          Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
corOvlMerDistinct                      K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
corOvlMerSize                          K-mer size for seeds in overlaps
corOvlMerThreshold                      K-mer frequency threshold; mers more frequent than this count are ignored
corOvlRefBlockLength                    Amount of sequence (bp) to search against the hash table per batch
corPartitionMin                        Don't make a read correction partition with fewer than N reads
corPartitions                          Partition read correction into N jobs
corReAlign                              Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses corOvlErrorRate
corStageSpace                          Amount of local disk space needed to stage data for read correction jobs
corThreads                              Number of threads to use for read correction jobs
cormhapConcurrency                      If grid not enabled, number of mhap overlaps for correction jobs to run at the same time; default is n_proc / n_threads
cormhapMemory                          Amount of memory, in gigabytes, to use for mhap overlaps for correction jobs
cormhapStageSpace                      Amount of local disk space needed to stage data for mhap overlaps for correction jobs
cormhapThreads                          Number of threads to use for mhap overlaps for correction jobs
cormmapConcurrency                      If grid not enabled, number of mmap overlaps for correction jobs to run at the same time; default is n_proc / n_threads
cormmapMemory                          Amount of memory, in gigabytes, to use for mmap overlaps for correction jobs
cormmapStageSpace                      Amount of local disk space needed to stage data for mmap overlaps for correction jobs
cormmapThreads                          Number of threads to use for mmap overlaps for correction jobs
corovlConcurrency                      If grid not enabled, number of overlaps for correction jobs to run at the same time; default is n_proc / n_threads
corovlMemory                            Amount of memory, in gigabytes, to use for overlaps for correction jobs
corovlStageSpace                        Amount of local disk space needed to stage data for overlaps for correction jobs
corovlThreads                          Number of threads to use for overlaps for correction jobs
correctedErrorRate                      Expected fraction error in an alignment of two corrected reads
enableOEA                              Do overlap error adjustment - comprises two steps: read error detection (RED) and overlap error adjustment (OEA); default 'true'
executiveMemory                        Amount of memory, in GB, to reserve for the Canu exective process
executiveThreads                        Number of threads to reserve for the Canu exective process
genomeSize                              An estimate of the size of the genome
gnuplot                                Path to the gnuplot executable
gnuplotImageFormat                      Image format that gnuplot will generate.  Default: based on gnuplot, 'png', 'svg' or 'gif'
gridEngine                              Grid engine configuration, not documented
gridEngineArrayMaxJobs                  Grid engine configuration, not documented
gridEngineArrayName                    Grid engine configuration, not documented
gridEngineArrayOption                  Grid engine configuration, not documented
gridEngineArraySubmitID                Grid engine configuration, not documented
gridEngineJobID                        Grid engine configuration, not documented
gridEngineMemoryOption                  Grid engine configuration, not documented
gridEngineMemoryPerJob                  Grid engine configuration, not documented
gridEngineMemoryUnits                  Grid engine configuration, not documented
gridEngineNameOption                    Grid engine configuration, not documented
gridEngineNameToJobIDCommand            Grid engine configuration, not documented
gridEngineNameToJobIDCommandNoArray    Grid engine configuration, not documented
gridEngineOutputOption                  Grid engine configuration, not documented
gridEngineResourceOption                Grid engine configuration, not documented
gridEngineStageOption                  Grid engine configuration, not documented
gridEngineSubmitCommand                Grid engine configuration, not documented
gridEngineTaskID                        Grid engine configuration, not documented
gridEngineThreadsOption                Grid engine configuration, not documented
gridOptions                            Grid engine options applied to all jobs
gridOptionsExecutive                    Grid engine options applied to the canu executive script
gridOptionsJobName                      Grid jobs job-name suffix
gridOptionsbat                          Grid engine options applied to unitig construction jobs
gridOptionscns                          Grid engine options applied to unitig consensus jobs
gridOptionscor                          Grid engine options applied to read correction jobs
gridOptionscormhap                      Grid engine options applied to mhap overlaps for correction jobs
gridOptionscormmap                      Grid engine options applied to mmap overlaps for correction jobs
gridOptionscorovl                      Grid engine options applied to overlaps for correction jobs
gridOptionshap                          Grid engine options applied to haplotype assignment jobs
gridOptionsmeryl                        Grid engine options applied to mer counting jobs
gridOptionsobtmhap                      Grid engine options applied to mhap overlaps for trimming jobs
gridOptionsobtmmap                      Grid engine options applied to mmap overlaps for trimming jobs
gridOptionsobtovl                      Grid engine options applied to overlaps for trimming jobs
gridOptionsoea                          Grid engine options applied to overlap error adjustment jobs
gridOptionsovb                          Grid engine options applied to overlap store bucketizing jobs
gridOptionsovs                          Grid engine options applied to overlap store sorting jobs
gridOptionsred                          Grid engine options applied to read error detection jobs
gridOptionsutgmhap                      Grid engine options applied to mhap overlaps for unitig construction jobs
gridOptionsutgmmap                      Grid engine options applied to mmap overlaps for unitig construction jobs
gridOptionsutgovl                      Grid engine options applied to overlaps for unitig construction jobs
hapConcurrency                          Unused, there is only one process
hapMemory                              Amount of memory, in gigabytes, to use for haplotype assignment
hapStageSpace                          Amount of local disk space needed to stage data for haplotype assignment jobs
hapThreads                              Number of threads to use for haplotype assignment
hapUnknownFraction                      Fraction of allowed unknown bases before they are included in the assembly, between 0-1; default 0.05
homoPolyCompress                        Compute everything but consensus sequences using homopolymer compressed reads
java                                    Java interpreter to use; at least version 1.8; default 'java'
javaUse64Bit                            Java interpreter supports the -d64 or -d32 flags; default auto
maxInputCoverage                        If input coverage is high, downsample to something reasonable; default 200
maxMemory                              Maximum memory to use by any component of the assembler
maxThreads                              Maximum number of compute threads to use by any component of the assembler
merylConcurrency                        Unused, there is only one process
merylMemory                            Amount of memory, in gigabytes, to use for mer counting
merylStageSpace                        Amount of local disk space needed to stage data for mer counting jobs
merylThreads                            Number of threads to use for mer counting
minInputCoverage                        Stop if input coverage is too low; default 10
minMemory                              Minimum amount of memory needed to compute the assembly (do not set unless prompted!)
minOverlapLength                        Overlaps shorter than this length are not computed; default 500
minReadLength                          Reads shorter than this length are not loaded into the assembler; default 1000
minThreads                              Minimum number of compute threads suggested to compute the assembly
minimap                                Path to minimap2; default 'minimap2'
objectStore                            Type of object storage used; not ready for production yet
objectStoreClient                      Path to the command line client used to access the object storage
objectStoreClientDA                    Path to the command line client used to download files from object storage
objectStoreClientUA                    Path to the command line client used to upload files to object storage
objectStoreNameSpace                    Object store parameters; specific to the type of objectStore used
objectStoreProject                      Object store project; specific to the type of objectStore used
obtErrorRate                            Stringency of overlaps to use for trimming
obtMMapBlockSize                        Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
obtMMapMerSize                          K-mer size for seeds in minmap
obtMhapBlockSize                        Number of reads per GB of memory allowed (mhapMemory)
obtMhapFilterThreshold                  Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
obtMhapFilterUnique                    Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
obtMhapMerSize                          K-mer size for seeds in mhap
obtMhapNoTf                            Expert option: True or false, do not use tf weighting, only idf of tf-idf.
obtMhapOptions                          Expert option: free-form parameters to pass to MHAP.
obtMhapOrderedMerSize                  K-mer size for second-stage filter in mhap
obtMhapSensitivity                      Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
obtMhapVersion                          Version of the MHAP jar file to use
obtOverlapper                          Which overlap algorithm to use for overlap based trimming
obtOvlErrorRate                        Overlaps at or below this error rate are used to trim reads
obtOvlFilter                            Filter overlaps based on expected kmers vs observed kmers
obtOvlFrequentMers                      Do not seed overlaps with these kmers
obtOvlHashBits                          Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per obtOvlHashBlockLength
obtOvlHashBlockLength                  Amount of sequence (bp) to load into the overlap hash table
obtOvlHashLoad                          Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
obtOvlMerDistinct                      K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
obtOvlMerSize                          K-mer size for seeds in overlaps
obtOvlMerThreshold                      K-mer frequency threshold; mers more frequent than this count are ignored
obtOvlRefBlockLength                    Amount of sequence (bp) to search against the hash table per batch
obtReAlign                              Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses obtOvlErrorRate
obtmhapConcurrency                      If grid not enabled, number of mhap overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
obtmhapMemory                          Amount of memory, in gigabytes, to use for mhap overlaps for trimming jobs
obtmhapStageSpace                      Amount of local disk space needed to stage data for mhap overlaps for trimming jobs
obtmhapThreads                          Number of threads to use for mhap overlaps for trimming jobs
obtmmapConcurrency                      If grid not enabled, number of mmap overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
obtmmapMemory                          Amount of memory, in gigabytes, to use for mmap overlaps for trimming jobs
obtmmapStageSpace                      Amount of local disk space needed to stage data for mmap overlaps for trimming jobs
obtmmapThreads                          Number of threads to use for mmap overlaps for trimming jobs
obtovlConcurrency                      If grid not enabled, number of overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
obtovlMemory                            Amount of memory, in gigabytes, to use for overlaps for trimming jobs
obtovlStageSpace                        Amount of local disk space needed to stage data for overlaps for trimming jobs
obtovlThreads                          Number of threads to use for overlaps for trimming jobs
oeaBatchLength                          Number of bases per overlap error correction batch
oeaBatchSize                            Number of reads per overlap error correction batch
oeaConcurrency                          If grid not enabled, number of overlap error adjustment jobs to run at the same time; default is n_proc / n_threads
oeaMemory                              Amount of memory, in gigabytes, to use for overlap error adjustment jobs
oeaStageSpace                          Amount of local disk space needed to stage data for overlap error adjustment jobs
oeaThreads                              Number of threads to use for overlap error adjustment jobs
onFailure                              Full path to command to run on failure
onSuccess                              Full path to command to run on successful completion
ovbConcurrency                          If grid not enabled, number of overlap store bucketizing jobs to run at the same time; default is n_proc / n_threads
ovbMemory                              Amount of memory, in gigabytes, to use for overlap store bucketizing jobs
ovbStageSpace                          Amount of local disk space needed to stage data for overlap store bucketizing jobs
ovbThreads                              Number of threads to use for overlap store bucketizing jobs
ovsConcurrency                          If grid not enabled, number of overlap store sorting jobs to run at the same time; default is n_proc / n_threads
ovsMemory                              Amount of memory, in gigabytes, to use for overlap store sorting jobs
ovsStageSpace                          Amount of local disk space needed to stage data for overlap store sorting jobs
ovsThreads                              Number of threads to use for overlap store sorting jobs
preExec                                A command line to run at the start of Canu execution scripts
purgeOverlaps                          When to delete intermediate overlap files: never, normal (default), aggressive, dangerous
rawErrorRate                            Expected fraction error in an alignment of two uncorrected reads
readSamplingBias                        Score reads as 'random * length^bias', keep the highest scoring reads
readSamplingCoverage                    DEPRECATED; use maxInputCoverage.  Discard reads to make the input be of this size
redBatchLength                          Number of bases per fragment error detection batch
redBatchSize                            Number of reads per fragment error detection batch
redConcurrency                          If grid not enabled, number of read error detection jobs to run at the same time; default is n_proc / n_threads
redMemory                              Amount of memory, in gigabytes, to use for read error detection jobs
redStageSpace                          Amount of local disk space needed to stage data for read error detection jobs
redThreads                              Number of threads to use for read error detection jobs
saveMerCounts                          Save full mer counting results, sometimes useful
saveOverlaps                            Do not remove the overlap stores.  Default: false = remove overlap stores when they're no longer needed
saveReadCorrections                    Save intermediate read correction files, almost never a good idea
saveReadHaplotypes                      Save intermediate read haplotype files, almost never a good idea
saveReads                              Save intermediate corrected and trimmed reads to asm.correctedReads.fasta.gz and asm.trimmedReads.fasta.gz
shell                                  Command interpreter to use; sh-compatible (e.g., bash), NOT C-shell (csh or tcsh); default '/bin/sh'
showNext                                Don't run any commands, just report what would run
stageDirectory                          If set, copy heavily used data to this node-local location
stopAfter                              Stop after a specific algorithm step is completed
stopOnLowCoverage                      Stop if raw, corrected or trimmed read coverage is low
trimReadsCoverage                      Minimum depth of evidence to retain bases; default '2
trimReadsOverlap                        Minimum overlap between evidence to make contiguous trim; default '500'
unitigger                              Which unitig algorithm to use; only 'bogart' supported; default 'bogart'
useGrid                                If 'true', enable grid-based execution; if 'false', run all jobs on the local machine; if 'remote', create jobs for grid execution but do not submit; default 'true'
useGridBAT                              If 'true', run module BAT under grid control; if 'false' run locally.
useGridCNS                              If 'true', run module CNS under grid control; if 'false' run locally.
useGridCOR                              If 'true', run module COR under grid control; if 'false' run locally.
useGridCORMHAP                          If 'true', run module CORMHAP under grid control; if 'false' run locally.
useGridCORMMAP                          If 'true', run module CORMMAP under grid control; if 'false' run locally.
useGridCOROVL                          If 'true', run module COROVL under grid control; if 'false' run locally.
useGridHAP                              If 'true', run module HAP under grid control; if 'false' run locally.
useGridMERYL                            If 'true', run module MERYL under grid control; if 'false' run locally.
useGridOBTMHAP                          If 'true', run module OBTMHAP under grid control; if 'false' run locally.
useGridOBTMMAP                          If 'true', run module OBTMMAP under grid control; if 'false' run locally.
useGridOBTOVL                          If 'true', run module OBTOVL under grid control; if 'false' run locally.
useGridOEA                              If 'true', run module OEA under grid control; if 'false' run locally.
useGridOVB                              If 'true', run module OVB under grid control; if 'false' run locally.
useGridOVS                              If 'true', run module OVS under grid control; if 'false' run locally.
useGridRED                              If 'true', run module RED under grid control; if 'false' run locally.
useGridUTGMHAP                          If 'true', run module UTGMHAP under grid control; if 'false' run locally.
useGridUTGMMAP                          If 'true', run module UTGMMAP under grid control; if 'false' run locally.
useGridUTGOVL                          If 'true', run module UTGOVL under grid control; if 'false' run locally.
utgBubbleDeviation                      Overlaps this much above mean of contig will be used to identify bubbles
utgChimeraType                          When to filter reads for contig construction: none, chimera (missing middle), uncovered (missing middle or ends), deadend (missing middle or end or no neighbor) (default)
utgErrorRate                            Overlaps at or below this error rate are used to construct contigs
utgGraphDeviation                      Overlaps this much above median will not be used for initial graph construction
utgMMapBlockSize                        Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
utgMMapMerSize                          K-mer size for seeds in minmap
utgMhapBlockSize                        Number of reads per GB of memory allowed (mhapMemory)
utgMhapFilterThreshold                  Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
utgMhapFilterUnique                    Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
utgMhapMerSize                          K-mer size for seeds in mhap
utgMhapNoTf                            Expert option: True or false, do not use tf weighting, only idf of tf-idf.
utgMhapOptions                          Expert option: free-form parameters to pass to MHAP.
utgMhapOrderedMerSize                  K-mer size for second-stage filter in mhap
utgMhapSensitivity                      Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
utgMhapVersion                          Version of the MHAP jar file to use
utgOverlapper                          Which overlap algorithm to use for unitig construction
utgOvlErrorRate                        Overlaps at or below this error rate are used to trim reads
utgOvlFilter                            Filter overlaps based on expected kmers vs observed kmers
utgOvlFrequentMers                      Do not seed overlaps with these kmers
utgOvlHashBits                          Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per utgOvlHashBlockLength
utgOvlHashBlockLength                  Amount of sequence (bp) to load into the overlap hash table
utgOvlHashLoad                          Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
utgOvlMerDistinct                      K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
utgOvlMerSize                          K-mer size for seeds in overlaps
utgOvlMerThreshold                      K-mer frequency threshold; mers more frequent than this count are ignored
utgOvlRefBlockLength                    Amount of sequence (bp) to search against the hash table per batch
utgReAlign                              Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses utgOvlErrorRate
utgRepeatConfusedBP                    Repeats where the next best edge is at least this many bp shorter will not be split
utgRepeatConfusedPC                    Repeats where the next best edge is at least this many percent shorter will not be split
utgRepeatDeviation                      Overlaps this much above mean unitig error rate will not be used for repeat splitting
utgmhapConcurrency                      If grid not enabled, number of mhap overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
utgmhapMemory                          Amount of memory, in gigabytes, to use for mhap overlaps for unitig construction jobs
utgmhapStageSpace                      Amount of local disk space needed to stage data for mhap overlaps for unitig construction jobs
utgmhapThreads                          Number of threads to use for mhap overlaps for unitig construction jobs
utgmmapConcurrency                      If grid not enabled, number of mmap overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
utgmmapMemory                          Amount of memory, in gigabytes, to use for mmap overlaps for unitig construction jobs
utgmmapStageSpace                      Amount of local disk space needed to stage data for mmap overlaps for unitig construction jobs
utgmmapThreads                          Number of threads to use for mmap overlaps for unitig construction jobs
utgovlConcurrency                      If grid not enabled, number of overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
utgovlMemory                            Amount of memory, in gigabytes, to use for overlaps for unitig construction jobs
utgovlStageSpace                        Amount of local disk space needed to stage data for overlaps for unitig construction jobs
utgovlThreads                          Number of threads to use for overlaps for unitig construction jobs


<pre class="gcommand">
[cft07037@d2-13 canu]$ ml canu/2.2-GCCcore-11.2.0
[cft07037@d2-13 canu]$ canu -options
batConcurrency                    Unused, only one process supported
batMemory                          Approximate maximum memory usage, in gigabytes, default is the maxMemory limit
batOptions                        Advanced options to bogart
batStageSpace                      Amount of local disk space needed to stage data for unitig construction jobs
batThreads                        Number of threads to use; default is the maxThreads limit
cnsConcurrency                    If grid not enabled, number of unitig consensus jobs to run at the same time; default is n_proc / n_threads
cnsConsensus                      Which consensus algorithm to use; 'pbdagcon' (fast, reliable); 'utgcns' (multialignment output); 'quick' (single read mosaic); default 'pbdagcon'
cnsErrorRate                      Consensus expects alignments at about this error rate
cnsMaxCoverage                    Limit unitig consensus to at most this coverage; default '40' = unlimited
cnsMemory                          Amount of memory, in gigabytes, to use for unitig consensus jobs
cnsPartitions                      Attempt to create this many consensus jobs; default '0' = based on the largest tig
cnsStageSpace                      Amount of local disk space needed to stage data for unitig consensus jobs
cnsThreads                        Number of threads to use for unitig consensus jobs
contigFilter                      Parameters to filter out 'unassembled' unitigs.  Five values: minReads minLength singleReadSpan lowCovFraction lowCovDepth
corConcurrency                    If grid not enabled, number of read correction jobs to run at the same time; default is n_proc / n_threads
corConsensus                      Which consensus algorithm to use; only 'falcon' is supported; default 'falcon'
corErrorRate                      Only use raw alignments below this error rate to construct corrected reads
corFilter                          Method to filter short reads from correction; 'quick' or 'expensive'; default 'expensive'
corMaxEvidenceCoverageGlobal      Limit reads used for correction to supporting at most this coverage; default: '1.0x' = 1.0 * estimated coverage
corMaxEvidenceCoverageLocal        Limit reads being corrected to at most this much evidence coverage; default: '2.0x' = 2.0 * estimated coverage
corMaxEvidenceErate                Limit read correction to only overlaps at or below this fraction error; default: unlimited
corMemory                          Amount of memory, in gigabytes, to use for read correction jobs
corMhapBlockSize                  Number of reads per GB of memory allowed (mhapMemory)
cormhapConcurrency                If grid not enabled, number of mhap overlaps for correction jobs to run at the same time; default is n_proc / n_threads
corMhapFilterThreshold            Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
corMhapFilterUnique                Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
cormhapMemory                      Amount of memory, in gigabytes, to use for mhap overlaps for correction jobs
corMhapMerSize                    K-mer size for seeds in mhap
corMhapNoTf                        Expert option: True or false, do not use tf weighting, only idf of tf-idf.
corMhapOptions                    Expert option: free-form parameters to pass to MHAP.
corMhapOrderedMerSize              K-mer size for second-stage filter in mhap
corMhapPipe                        Report results to a pipe instead of *large* files.
corMhapSensitivity                Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
cormhapStageSpace                  Amount of local disk space needed to stage data for mhap overlaps for correction jobs
cormhapThreads                    Number of threads to use for mhap overlaps for correction jobs
corMhapVersion                    Version of the MHAP jar file to use
corMinCoverage                    Minimum number of bases supporting each corrected base, if less than this sequences are split; default based on input read coverage: 0 <= 30x < 4 < 60x <= 4
corMinEvidenceLength              Limit read correction to only overlaps longer than this; default: unlimited
corMMapBlockSize                  Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
cormmapConcurrency                If grid not enabled, number of mmap overlaps for correction jobs to run at the same time; default is n_proc / n_threads
cormmapMemory                      Amount of memory, in gigabytes, to use for mmap overlaps for correction jobs
corMMapMerSize                    K-mer size for seeds in minmap
cormmapStageSpace                  Amount of local disk space needed to stage data for mmap overlaps for correction jobs
cormmapThreads                    Number of threads to use for mmap overlaps for correction jobs
corOutCoverage                    Only correct the longest reads up to this coverage; default 40
corOverlapper                      Which overlap algorithm to use for correction
corovlConcurrency                  If grid not enabled, number of overlaps for correction jobs to run at the same time; default is n_proc / n_threads
corOvlErrorRate                    Overlaps above this error rate are not computed
corOvlFilter                      Filter overlaps based on expected kmers vs observed kmers
corOvlFrequentMers                Do not seed overlaps with these kmers
corOvlHashBits                    Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per corOvlHashBlockLength
corOvlHashBlockLength              Amount of sequence (bp) to load into the overlap hash table
corOvlHashLoad                    Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
corovlMemory                      Amount of memory, in gigabytes, to use for overlaps for correction jobs
corOvlMerDistinct                  K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
corOvlMerSize                      K-mer size for seeds in overlaps
corOvlMerThreshold                K-mer frequency threshold; mers more frequent than this count are ignored
corOvlRefBlockLength              Amount of sequence (bp) to search against the hash table per batch
corovlStageSpace                  Amount of local disk space needed to stage data for overlaps for correction jobs
corovlThreads                      Number of threads to use for overlaps for correction jobs
corPartitionMin                    Don't make a read correction partition with fewer than N reads
corPartitions                      Partition read correction into N jobs
corReAlign                        Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses corOvlErrorRate
correctedErrorRate                Expected fraction error in an alignment of two corrected reads
corStageSpace                      Amount of local disk space needed to stage data for read correction jobs
corThreads                        Number of threads to use for read correction jobs
enableOEA                          Do overlap error adjustment - comprises two steps: read error detection (RED) and overlap error adjustment (OEA); default 'true'
executiveMemory                    Amount of memory, in GB, to reserve for the Canu exective process
executiveThreads                  Number of threads to reserve for the Canu exective process
genomeSize                        An estimate of the size of the genome
gnuplot                            Path to the gnuplot executable
gnuplotImageFormat                Image format that gnuplot will generate.  Default: based on gnuplot, 'png', 'svg' or 'gif'
gridEngine                        Grid engine configuration, not documented
gridEngineArrayMaxJobs            Grid engine configuration, not documented
gridEngineArrayName                Grid engine configuration, not documented
gridEngineArrayOption              Grid engine configuration, not documented
gridEngineArraySubmitID            Grid engine configuration, not documented
gridEngineJobID                    Grid engine configuration, not documented
gridEngineMemoryOption            Grid engine configuration, not documented
gridEngineMemoryPerJob            Grid engine configuration, not documented
gridEngineMemoryUnits              Grid engine configuration, not documented
gridEngineNameOption              Grid engine configuration, not documented
gridEngineNameToJobIDCommand      Grid engine configuration, not documented
gridEngineNameToJobIDCommandNoArrayGrid engine configuration, not documented
gridEngineOutputOption            Grid engine configuration, not documented
gridEngineResourceOption          Grid engine configuration, not documented
gridEngineStageOption              Grid engine configuration, not documented
gridEngineSubmitCommand            Grid engine configuration, not documented
gridEngineTaskID                  Grid engine configuration, not documented
gridEngineThreadsOption            Grid engine configuration, not documented
gridOptions                        Grid engine options applied to all jobs
gridOptionsbat                    Grid engine options applied to unitig construction jobs
gridOptionscns                    Grid engine options applied to unitig consensus jobs
gridOptionscor                    Grid engine options applied to read correction jobs
gridOptionscormhap                Grid engine options applied to mhap overlaps for correction jobs
gridOptionscormmap                Grid engine options applied to mmap overlaps for correction jobs
gridOptionscorovl                  Grid engine options applied to overlaps for correction jobs
gridOptionsExecutive              Grid engine options applied to the canu executive script
gridOptionshap                    Grid engine options applied to haplotype assignment jobs
gridOptionsJobName                Grid jobs job-name suffix
gridOptionsmeryl                  Grid engine options applied to mer counting jobs
gridOptionsobtmhap                Grid engine options applied to mhap overlaps for trimming jobs
gridOptionsobtmmap                Grid engine options applied to mmap overlaps for trimming jobs
gridOptionsobtovl                  Grid engine options applied to overlaps for trimming jobs
gridOptionsoea                    Grid engine options applied to overlap error adjustment jobs
gridOptionsovb                    Grid engine options applied to overlap store bucketizing jobs
gridOptionsovs                    Grid engine options applied to overlap store sorting jobs
gridOptionsred                    Grid engine options applied to read error detection jobs
gridOptionsutgmhap                Grid engine options applied to mhap overlaps for unitig construction jobs
gridOptionsutgmmap                Grid engine options applied to mmap overlaps for unitig construction jobs
gridOptionsutgovl                  Grid engine options applied to overlaps for unitig construction jobs
hapConcurrency                    Unused, there is only one process
hapMemory                          Amount of memory, in gigabytes, to use for haplotype assignment
hapStageSpace                      Amount of local disk space needed to stage data for haplotype assignment jobs
hapThreads                        Number of threads to use for haplotype assignment
hapUnknownFraction                Fraction of allowed unknown bases before they are included in the assembly, between 0-1; default 0.05
homoPolyCompress                  Compute everything but consensus sequences using homopolymer compressed reads
java                              Java interpreter to use; at least version 1.8; default 'java'
javaUse64Bit                      Java interpreter supports the -d64 or -d32 flags; default auto
maxInputCoverage                  If input coverage is high, downsample to something reasonable; default 200
maxMemory                          Maximum memory to use by any component of the assembler
maxThreads                        Maximum number of compute threads to use by any component of the assembler
merylConcurrency                  Unused, there is only one process
merylMemory                        Amount of memory, in gigabytes, to use for mer counting
merylStageSpace                    Amount of local disk space needed to stage data for mer counting jobs
merylThreads                      Number of threads to use for mer counting
minimap                            Path to minimap2; default 'minimap2'
minInputCoverage                  Stop if input coverage is too low; default 10
minMemory                          Minimum amount of memory needed to compute the assembly (do not set unless prompted!)
minOverlapLength                  Overlaps shorter than this length are not computed; default 500
minReadLength                      Reads shorter than this length are not loaded into the assembler; default 1000
minThreads                        Minimum number of compute threads suggested to compute the assembly
objectStore                        Type of object storage used; not ready for production yet
objectStoreClient                  Path to the command line client used to access the object storage
objectStoreClientDA                Path to the command line client used to download files from object storage
objectStoreClientUA                Path to the command line client used to upload files to object storage
objectStoreNameSpace              Object store parameters; specific to the type of objectStore used
objectStoreProject                Object store project; specific to the type of objectStore used
obtErrorRate                      Stringency of overlaps to use for trimming
obtMhapBlockSize                  Number of reads per GB of memory allowed (mhapMemory)
obtmhapConcurrency                If grid not enabled, number of mhap overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
obtMhapFilterThreshold            Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
obtMhapFilterUnique                Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
obtmhapMemory                      Amount of memory, in gigabytes, to use for mhap overlaps for trimming jobs
obtMhapMerSize                    K-mer size for seeds in mhap
obtMhapNoTf                        Expert option: True or false, do not use tf weighting, only idf of tf-idf.
obtMhapOptions                    Expert option: free-form parameters to pass to MHAP.
obtMhapOrderedMerSize              K-mer size for second-stage filter in mhap
obtMhapPipe                        Report results to a pipe instead of *large* files.
obtMhapSensitivity                Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
obtmhapStageSpace                  Amount of local disk space needed to stage data for mhap overlaps for trimming jobs
obtmhapThreads                    Number of threads to use for mhap overlaps for trimming jobs
obtMhapVersion                    Version of the MHAP jar file to use
obtMMapBlockSize                  Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
obtmmapConcurrency                If grid not enabled, number of mmap overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
obtmmapMemory                      Amount of memory, in gigabytes, to use for mmap overlaps for trimming jobs
obtMMapMerSize                    K-mer size for seeds in minmap
obtmmapStageSpace                  Amount of local disk space needed to stage data for mmap overlaps for trimming jobs
obtmmapThreads                    Number of threads to use for mmap overlaps for trimming jobs
obtOverlapper                      Which overlap algorithm to use for overlap based trimming
obtovlConcurrency                  If grid not enabled, number of overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
obtOvlErrorRate                    Overlaps at or below this error rate are used to trim reads
obtOvlFilter                      Filter overlaps based on expected kmers vs observed kmers
obtOvlFrequentMers                Do not seed overlaps with these kmers
obtOvlHashBits                    Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per obtOvlHashBlockLength
obtOvlHashBlockLength              Amount of sequence (bp) to load into the overlap hash table
obtOvlHashLoad                    Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
obtovlMemory                      Amount of memory, in gigabytes, to use for overlaps for trimming jobs
obtOvlMerDistinct                  K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
obtOvlMerSize                      K-mer size for seeds in overlaps
obtOvlMerThreshold                K-mer frequency threshold; mers more frequent than this count are ignored
obtOvlRefBlockLength              Amount of sequence (bp) to search against the hash table per batch
obtovlStageSpace                  Amount of local disk space needed to stage data for overlaps for trimming jobs
obtovlThreads                      Number of threads to use for overlaps for trimming jobs
obtReAlign                        Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses obtOvlErrorRate
oeaBatchLength                    Number of bases per overlap error correction batch
oeaBatchSize                      Number of reads per overlap error correction batch
oeaConcurrency                    If grid not enabled, number of overlap error adjustment jobs to run at the same time; default is n_proc / n_threads
oeaErrorRate                      Only use overlaps with at most this much fraction error to find errors in reads; default utgOvlErrorRate, 0.003 for HiFi reads
oeaHaploConfirm                    This many or more reads will confirm a true haplotype difference; default 5
oeaMaskTrivial                    Mask trivial DNA in Overlap Error Adjustment; default off; on for HiFi reads
oeaMemory                          Amount of memory, in gigabytes, to use for overlap error adjustment jobs
oeaStageSpace                      Amount of local disk space needed to stage data for overlap error adjustment jobs
oeaThreads                        Number of threads to use for overlap error adjustment jobs
onFailure                          Full path to command to run on failure
onSuccess                          Full path to command to run on successful completion
ovbConcurrency                    If grid not enabled, number of overlap store bucketizing jobs to run at the same time; default is n_proc / n_threads
ovbMemory                          Amount of memory, in gigabytes, to use for overlap store bucketizing jobs
ovbStageSpace                      Amount of local disk space needed to stage data for overlap store bucketizing jobs
ovbThreads                        Number of threads to use for overlap store bucketizing jobs
ovsConcurrency                    If grid not enabled, number of overlap store sorting jobs to run at the same time; default is n_proc / n_threads
ovsMemory                          Amount of memory, in gigabytes, to use for overlap store sorting jobs
ovsStageSpace                      Amount of local disk space needed to stage data for overlap store sorting jobs
ovsThreads                        Number of threads to use for overlap store sorting jobs
preExec                            A command line to run at the start of Canu execution scripts
purgeOverlaps                      When to delete intermediate overlap files: never, normal (default), aggressive, dangerous
rawErrorRate                      Expected fraction error in an alignment of two uncorrected reads
readSamplingBias                  Score reads as 'random * length^bias', keep the highest scoring reads
redBatchLength                    Number of bases per fragment error detection batch
redBatchSize                      Number of reads per fragment error detection batch
redConcurrency                    If grid not enabled, number of read error detection jobs to run at the same time; default is n_proc / n_threads
redMemory                          Amount of memory, in gigabytes, to use for read error detection jobs
redStageSpace                      Amount of local disk space needed to stage data for read error detection jobs
redThreads                        Number of threads to use for read error detection jobs
saveMerCounts                      Save full mer counting results, sometimes useful
saveOverlaps                      Do not remove the overlap stores.  Default: false = remove overlap stores when they're no longer needed
saveReadCorrections                Save intermediate read correction files, almost never a good idea
saveReadHaplotypes                Save intermediate read haplotype files, almost never a good idea
saveReads                          Save intermediate corrected and trimmed reads to asm.correctedReads.fasta.gz and asm.trimmedReads.fasta.gz
shell                              Command interpreter to use; sh-compatible (e.g., bash), NOT C-shell (csh or tcsh); default '/bin/sh'
showNext                          Don't run any commands, just report what would run
stageDirectory                    If set, copy heavily used data to this node-local location
stopAfter                          Stop after a specific algorithm step is completed
stopOnLowCoverage                  Stop if raw, corrected or trimmed read coverage is low
trimReadsCoverage                  Minimum depth of evidence to retain bases; default '2
trimReadsOverlap                  Minimum overlap between evidence to make contiguous trim; default '500'
unitigger                          Which unitig algorithm to use; only 'bogart' supported; default 'bogart'
useGrid                            If 'true', enable grid-based execution; if 'false', run all jobs on the local machine; if 'remote', create jobs for grid execution but do not submit; default 'true'
useGridbat                        If 'true', run module unitig construction under grid control; if 'false' run locally.
useGridcns                        If 'true', run module unitig consensus under grid control; if 'false' run locally.
useGridcor                        If 'true', run module read correction under grid control; if 'false' run locally.
useGridcormhap                    If 'true', run module mhap overlaps for correction under grid control; if 'false' run locally.
useGridcormmap                    If 'true', run module mmap overlaps for correction under grid control; if 'false' run locally.
useGridcorovl                      If 'true', run module overlaps for correction under grid control; if 'false' run locally.
useGridhap                        If 'true', run module haplotype assignment under grid control; if 'false' run locally.
useGridmeryl                      If 'true', run module mer counting under grid control; if 'false' run locally.
useGridobtmhap                    If 'true', run module mhap overlaps for trimming under grid control; if 'false' run locally.
useGridobtmmap                    If 'true', run module mmap overlaps for trimming under grid control; if 'false' run locally.
useGridobtovl                      If 'true', run module overlaps for trimming under grid control; if 'false' run locally.
useGridoea                        If 'true', run module overlap error adjustment under grid control; if 'false' run locally.
useGridovb                        If 'true', run module overlap store bucketizing under grid control; if 'false' run locally.
useGridovs                        If 'true', run module overlap store sorting under grid control; if 'false' run locally.
useGridred                        If 'true', run module read error detection under grid control; if 'false' run locally.
useGridutgmhap                    If 'true', run module mhap overlaps for unitig construction under grid control; if 'false' run locally.
useGridutgmmap                    If 'true', run module mmap overlaps for unitig construction under grid control; if 'false' run locally.
useGridutgovl                      If 'true', run module overlaps for unitig construction under grid control; if 'false' run locally.
utgBubbleDeviation                Overlaps this much above mean of contig will be used to identify bubbles
utgChimeraType                    When to filter reads for contig construction: none, chimera (missing middle), uncovered (missing middle or ends), deadend (missing middle or end or no neighbor) (default)
utgErrorRate                      Overlaps at or below this error rate are used to construct contigs
utgGraphDeviation                  Overlaps this much above median will not be used for initial graph construction
utgMhapBlockSize                  Number of reads per GB of memory allowed (mhapMemory)
utgmhapConcurrency                If grid not enabled, number of mhap overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
utgMhapFilterThreshold            Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
utgMhapFilterUnique                Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
utgmhapMemory                      Amount of memory, in gigabytes, to use for mhap overlaps for unitig construction jobs
utgMhapMerSize                    K-mer size for seeds in mhap
utgMhapNoTf                        Expert option: True or false, do not use tf weighting, only idf of tf-idf.
utgMhapOptions                    Expert option: free-form parameters to pass to MHAP.
utgMhapOrderedMerSize              K-mer size for second-stage filter in mhap
utgMhapPipe                        Report results to a pipe instead of *large* files.
utgMhapSensitivity                Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
utgmhapStageSpace                  Amount of local disk space needed to stage data for mhap overlaps for unitig construction jobs
utgmhapThreads                    Number of threads to use for mhap overlaps for unitig construction jobs
utgMhapVersion                    Version of the MHAP jar file to use
utgMMapBlockSize                  Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
utgmmapConcurrency                If grid not enabled, number of mmap overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
utgmmapMemory                      Amount of memory, in gigabytes, to use for mmap overlaps for unitig construction jobs
utgMMapMerSize                    K-mer size for seeds in minmap
utgmmapStageSpace                  Amount of local disk space needed to stage data for mmap overlaps for unitig construction jobs
utgmmapThreads                    Number of threads to use for mmap overlaps for unitig construction jobs
utgOverlapper                      Which overlap algorithm to use for unitig construction
utgovlConcurrency                  If grid not enabled, number of overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
utgOvlErrorRate                    Overlaps at or below this error rate are used to trim reads
utgOvlFilter                      Filter overlaps based on expected kmers vs observed kmers
utgOvlFrequentMers                Do not seed overlaps with these kmers
utgOvlHashBits                    Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per utgOvlHashBlockLength
utgOvlHashBlockLength              Amount of sequence (bp) to load into the overlap hash table
utgOvlHashLoad                    Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
utgovlMemory                      Amount of memory, in gigabytes, to use for overlaps for unitig construction jobs
utgOvlMerDistinct                  K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
utgOvlMerSize                      K-mer size for seeds in overlaps
utgOvlMerThreshold                K-mer frequency threshold; mers more frequent than this count are ignored
utgOvlRefBlockLength              Amount of sequence (bp) to search against the hash table per batch
utgovlStageSpace                  Amount of local disk space needed to stage data for overlaps for unitig construction jobs
utgovlThreads                      Number of threads to use for overlaps for unitig construction jobs
utgReAlign                        Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses utgOvlErrorRate
utgRepeatConfusedBP                Repeats where the next best edge is at least this many bp shorter will not be split
utgRepeatConfusedPC                Repeats where the next best edge is at least this many percent shorter will not be split
utgRepeatDeviation                Overlaps this much above mean unitig error rate will not be used for repeat splitting
</pre>
</pre>
[[#top|Back to Top]]
[[#top|Back to Top]]
Line 446: Line 430:
=== Installation ===
=== Installation ===
   
   
Source code obtained from https://github.com/marbl/canu/releases/download/v2.1.1/
Source code obtained from https://github.com/marbl/canu/releases/download/v2.2
 
=== System ===
=== System ===
64-bit Linux
64-bit Linux

Latest revision as of 09:30, 9 May 2024

Category

Bioinformatics

Program On

Sapelo2

Version

2.2

Author / Distributor

Canu

Description

"Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II/Sequel or Oxford Nanopore MinION). " More details are at Canu's documentation.

Running Program

Version 2.2

To use this version, please load the module with

ml canu/2.2-GCCcore-11.2.0

or with

ml canu/2.2-GCCcore-11.3.0

When you invoke canu, please use the gridOptions to pass queueing system options for the jobs the canu pipeline submits. At a minimum, please specify a partition, the number of tasks and the walltime. For example, use gridOptions = --partition=batch --ntasks=1 --cpus-per-task=4 --time=168:00:00 . The --mem-per-cpu option will be added automatically by the pipeline scripts, but you can also add it if the pipeline is not able to estimate the memory needed correctly.


Here is an example of a shell script, sub.sh, to run Canu on the batch queue:

#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=canujobname
#SBATCH --ntasks=1
#SBATCH --time=1:00:00
#SBATCH --mem=10G

cd $SLURM_SUBMIT_DIR

ml canu/2.2-GCCcore-11.2.0

canu gridOptions=" --partition=batch --ntasks=1 --cpus-per-task=4 --time=168:00:00 " [options]

where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well. Please note that the Slurm headers (#SBATCH lines) are only for Canu's initial job. The resource limits of all of the jobs that Canu spawns will be determined by what is defined in the gridOptions.


To submit the job submission use the command:

sbatch ./sub.sh 

Documentation

[cft07037@d2-13 canu]$ ml canu/2.2-GCCcore-11.2.0 
[cft07037@d2-13 canu]$ canu --help

usage:   canu [-version] [-citation] \
              [-haplotype | -correct | -trim | -assemble | -trim-assemble] \
              [-s <assembly-specifications-file>] \
               -p <assembly-prefix> \
               -d <assembly-directory> \
               genomeSize=<number>[g|m|k] \
              [other-options] \
              [-haplotype{NAME} illumina.fastq.gz] \
              [-corrected] \
              [-trimmed] \
              [-pacbio |
               -nanopore |
               -pacbio-hifi] file1 file2 ...

example: canu -d run1 -p godzilla genomeSize=1g -nanopore-raw reads/*.fasta.gz 


  To restrict canu to only a specific stage, use:
    -haplotype     - generate haplotype-specific reads
    -correct       - generate corrected reads
    -trim          - generate trimmed reads
    -assemble      - generate an assembly
    -trim-assemble - generate trimmed reads and then assemble them

  The assembly is computed in the -d <assembly-directory>, with output files named
  using the -p <assembly-prefix>.  This directory is created if needed.  It is not
  possible to run multiple assemblies in the same directory.

  The genome size should be your best guess of the haploid genome size of what is being
  assembled.  It is used primarily to estimate coverage in reads, NOT as the desired
  assembly size.  Fractional values are allowed: '4.7m' equals '4700k' equals '4700000'

  Some common options:
    useGrid=string
      - Run under grid control (true), locally (false), or set up for grid control
        but don't submit any jobs (remote)
    rawErrorRate=fraction-error
      - The allowed difference in an overlap between two raw uncorrected reads.  For lower
        quality reads, use a higher number.  The defaults are 0.300 for PacBio reads and
        0.500 for Nanopore reads.
    correctedErrorRate=fraction-error
      - The allowed difference in an overlap between two corrected reads.  Assemblies of
        low coverage or data with biological differences will benefit from a slight increase
        in this.  Defaults are 0.045 for PacBio reads and 0.144 for Nanopore reads.
    gridOptions=string
      - Pass string to the command used to submit jobs to the grid.  Can be used to set
        maximum run time limits.  Should NOT be used to set memory limits; Canu will do
        that for you.
    minReadLength=number
      - Ignore reads shorter than 'number' bases long.  Default: 1000.
    minOverlapLength=number
      - Ignore read-to-read overlaps shorter than 'number' bases long.  Default: 500.
  A full list of options can be printed with '-options'.  All options can be supplied in
  an optional sepc file with the -s option.

  For TrioCanu, haplotypes are specified with the -haplotype{NAME} option, with any
  number of haplotype-specific Illumina read files after.  The {NAME} of each haplotype
  is free text (but only letters and numbers, please).  For example:
    -haplotypeNANNY nanny/*gz
    -haplotypeBILLY billy1.fasta.gz billy2.fasta.gz

  Reads can be either FASTA or FASTQ format, uncompressed, or compressed with gz, bz2 or xz.

  Reads are specified by the technology they were generated with, and any processing performed.

  [processing]
    -corrected
    -trimmed

  [technology]
    -pacbio      <files>
    -nanopore    <files>
    -pacbio-hifi <files>

Complete documentation at http://canu.readthedocs.org/en/latest/


[cft07037@d2-13 canu]$ ml canu/2.2-GCCcore-11.2.0 
[cft07037@d2-13 canu]$ canu -options
batConcurrency                     Unused, only one process supported
batMemory                          Approximate maximum memory usage, in gigabytes, default is the maxMemory limit
batOptions                         Advanced options to bogart
batStageSpace                      Amount of local disk space needed to stage data for unitig construction jobs
batThreads                         Number of threads to use; default is the maxThreads limit
cnsConcurrency                     If grid not enabled, number of unitig consensus jobs to run at the same time; default is n_proc / n_threads
cnsConsensus                       Which consensus algorithm to use; 'pbdagcon' (fast, reliable); 'utgcns' (multialignment output); 'quick' (single read mosaic); default 'pbdagcon'
cnsErrorRate                       Consensus expects alignments at about this error rate
cnsMaxCoverage                     Limit unitig consensus to at most this coverage; default '40' = unlimited
cnsMemory                          Amount of memory, in gigabytes, to use for unitig consensus jobs
cnsPartitions                      Attempt to create this many consensus jobs; default '0' = based on the largest tig
cnsStageSpace                      Amount of local disk space needed to stage data for unitig consensus jobs
cnsThreads                         Number of threads to use for unitig consensus jobs
contigFilter                       Parameters to filter out 'unassembled' unitigs.  Five values: minReads minLength singleReadSpan lowCovFraction lowCovDepth
corConcurrency                     If grid not enabled, number of read correction jobs to run at the same time; default is n_proc / n_threads
corConsensus                       Which consensus algorithm to use; only 'falcon' is supported; default 'falcon'
corErrorRate                       Only use raw alignments below this error rate to construct corrected reads
corFilter                          Method to filter short reads from correction; 'quick' or 'expensive'; default 'expensive'
corMaxEvidenceCoverageGlobal       Limit reads used for correction to supporting at most this coverage; default: '1.0x' = 1.0 * estimated coverage
corMaxEvidenceCoverageLocal        Limit reads being corrected to at most this much evidence coverage; default: '2.0x' = 2.0 * estimated coverage
corMaxEvidenceErate                Limit read correction to only overlaps at or below this fraction error; default: unlimited
corMemory                          Amount of memory, in gigabytes, to use for read correction jobs
corMhapBlockSize                   Number of reads per GB of memory allowed (mhapMemory)
cormhapConcurrency                 If grid not enabled, number of mhap overlaps for correction jobs to run at the same time; default is n_proc / n_threads
corMhapFilterThreshold             Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
corMhapFilterUnique                Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
cormhapMemory                      Amount of memory, in gigabytes, to use for mhap overlaps for correction jobs
corMhapMerSize                     K-mer size for seeds in mhap
corMhapNoTf                        Expert option: True or false, do not use tf weighting, only idf of tf-idf.
corMhapOptions                     Expert option: free-form parameters to pass to MHAP.
corMhapOrderedMerSize              K-mer size for second-stage filter in mhap
corMhapPipe                        Report results to a pipe instead of *large* files.
corMhapSensitivity                 Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
cormhapStageSpace                  Amount of local disk space needed to stage data for mhap overlaps for correction jobs
cormhapThreads                     Number of threads to use for mhap overlaps for correction jobs
corMhapVersion                     Version of the MHAP jar file to use
corMinCoverage                     Minimum number of bases supporting each corrected base, if less than this sequences are split; default based on input read coverage: 0 <= 30x < 4 < 60x <= 4
corMinEvidenceLength               Limit read correction to only overlaps longer than this; default: unlimited
corMMapBlockSize                   Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
cormmapConcurrency                 If grid not enabled, number of mmap overlaps for correction jobs to run at the same time; default is n_proc / n_threads
cormmapMemory                      Amount of memory, in gigabytes, to use for mmap overlaps for correction jobs
corMMapMerSize                     K-mer size for seeds in minmap
cormmapStageSpace                  Amount of local disk space needed to stage data for mmap overlaps for correction jobs
cormmapThreads                     Number of threads to use for mmap overlaps for correction jobs
corOutCoverage                     Only correct the longest reads up to this coverage; default 40
corOverlapper                      Which overlap algorithm to use for correction
corovlConcurrency                  If grid not enabled, number of overlaps for correction jobs to run at the same time; default is n_proc / n_threads
corOvlErrorRate                    Overlaps above this error rate are not computed
corOvlFilter                       Filter overlaps based on expected kmers vs observed kmers
corOvlFrequentMers                 Do not seed overlaps with these kmers
corOvlHashBits                     Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per corOvlHashBlockLength
corOvlHashBlockLength              Amount of sequence (bp) to load into the overlap hash table
corOvlHashLoad                     Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
corovlMemory                       Amount of memory, in gigabytes, to use for overlaps for correction jobs
corOvlMerDistinct                  K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
corOvlMerSize                      K-mer size for seeds in overlaps
corOvlMerThreshold                 K-mer frequency threshold; mers more frequent than this count are ignored
corOvlRefBlockLength               Amount of sequence (bp) to search against the hash table per batch
corovlStageSpace                   Amount of local disk space needed to stage data for overlaps for correction jobs
corovlThreads                      Number of threads to use for overlaps for correction jobs
corPartitionMin                    Don't make a read correction partition with fewer than N reads
corPartitions                      Partition read correction into N jobs
corReAlign                         Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses corOvlErrorRate
correctedErrorRate                 Expected fraction error in an alignment of two corrected reads
corStageSpace                      Amount of local disk space needed to stage data for read correction jobs
corThreads                         Number of threads to use for read correction jobs
enableOEA                          Do overlap error adjustment - comprises two steps: read error detection (RED) and overlap error adjustment (OEA); default 'true'
executiveMemory                    Amount of memory, in GB, to reserve for the Canu exective process
executiveThreads                   Number of threads to reserve for the Canu exective process
genomeSize                         An estimate of the size of the genome
gnuplot                            Path to the gnuplot executable
gnuplotImageFormat                 Image format that gnuplot will generate.  Default: based on gnuplot, 'png', 'svg' or 'gif'
gridEngine                         Grid engine configuration, not documented
gridEngineArrayMaxJobs             Grid engine configuration, not documented
gridEngineArrayName                Grid engine configuration, not documented
gridEngineArrayOption              Grid engine configuration, not documented
gridEngineArraySubmitID            Grid engine configuration, not documented
gridEngineJobID                    Grid engine configuration, not documented
gridEngineMemoryOption             Grid engine configuration, not documented
gridEngineMemoryPerJob             Grid engine configuration, not documented
gridEngineMemoryUnits              Grid engine configuration, not documented
gridEngineNameOption               Grid engine configuration, not documented
gridEngineNameToJobIDCommand       Grid engine configuration, not documented
gridEngineNameToJobIDCommandNoArrayGrid engine configuration, not documented
gridEngineOutputOption             Grid engine configuration, not documented
gridEngineResourceOption           Grid engine configuration, not documented
gridEngineStageOption              Grid engine configuration, not documented
gridEngineSubmitCommand            Grid engine configuration, not documented
gridEngineTaskID                   Grid engine configuration, not documented
gridEngineThreadsOption            Grid engine configuration, not documented
gridOptions                        Grid engine options applied to all jobs
gridOptionsbat                     Grid engine options applied to unitig construction jobs
gridOptionscns                     Grid engine options applied to unitig consensus jobs
gridOptionscor                     Grid engine options applied to read correction jobs
gridOptionscormhap                 Grid engine options applied to mhap overlaps for correction jobs
gridOptionscormmap                 Grid engine options applied to mmap overlaps for correction jobs
gridOptionscorovl                  Grid engine options applied to overlaps for correction jobs
gridOptionsExecutive               Grid engine options applied to the canu executive script
gridOptionshap                     Grid engine options applied to haplotype assignment jobs
gridOptionsJobName                 Grid jobs job-name suffix
gridOptionsmeryl                   Grid engine options applied to mer counting jobs
gridOptionsobtmhap                 Grid engine options applied to mhap overlaps for trimming jobs
gridOptionsobtmmap                 Grid engine options applied to mmap overlaps for trimming jobs
gridOptionsobtovl                  Grid engine options applied to overlaps for trimming jobs
gridOptionsoea                     Grid engine options applied to overlap error adjustment jobs
gridOptionsovb                     Grid engine options applied to overlap store bucketizing jobs
gridOptionsovs                     Grid engine options applied to overlap store sorting jobs
gridOptionsred                     Grid engine options applied to read error detection jobs
gridOptionsutgmhap                 Grid engine options applied to mhap overlaps for unitig construction jobs
gridOptionsutgmmap                 Grid engine options applied to mmap overlaps for unitig construction jobs
gridOptionsutgovl                  Grid engine options applied to overlaps for unitig construction jobs
hapConcurrency                     Unused, there is only one process
hapMemory                          Amount of memory, in gigabytes, to use for haplotype assignment
hapStageSpace                      Amount of local disk space needed to stage data for haplotype assignment jobs
hapThreads                         Number of threads to use for haplotype assignment
hapUnknownFraction                 Fraction of allowed unknown bases before they are included in the assembly, between 0-1; default 0.05
homoPolyCompress                   Compute everything but consensus sequences using homopolymer compressed reads
java                               Java interpreter to use; at least version 1.8; default 'java'
javaUse64Bit                       Java interpreter supports the -d64 or -d32 flags; default auto
maxInputCoverage                   If input coverage is high, downsample to something reasonable; default 200
maxMemory                          Maximum memory to use by any component of the assembler
maxThreads                         Maximum number of compute threads to use by any component of the assembler
merylConcurrency                   Unused, there is only one process
merylMemory                        Amount of memory, in gigabytes, to use for mer counting
merylStageSpace                    Amount of local disk space needed to stage data for mer counting jobs
merylThreads                       Number of threads to use for mer counting
minimap                            Path to minimap2; default 'minimap2'
minInputCoverage                   Stop if input coverage is too low; default 10
minMemory                          Minimum amount of memory needed to compute the assembly (do not set unless prompted!)
minOverlapLength                   Overlaps shorter than this length are not computed; default 500
minReadLength                      Reads shorter than this length are not loaded into the assembler; default 1000
minThreads                         Minimum number of compute threads suggested to compute the assembly
objectStore                        Type of object storage used; not ready for production yet
objectStoreClient                  Path to the command line client used to access the object storage
objectStoreClientDA                Path to the command line client used to download files from object storage
objectStoreClientUA                Path to the command line client used to upload files to object storage
objectStoreNameSpace               Object store parameters; specific to the type of objectStore used
objectStoreProject                 Object store project; specific to the type of objectStore used
obtErrorRate                       Stringency of overlaps to use for trimming
obtMhapBlockSize                   Number of reads per GB of memory allowed (mhapMemory)
obtmhapConcurrency                 If grid not enabled, number of mhap overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
obtMhapFilterThreshold             Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
obtMhapFilterUnique                Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
obtmhapMemory                      Amount of memory, in gigabytes, to use for mhap overlaps for trimming jobs
obtMhapMerSize                     K-mer size for seeds in mhap
obtMhapNoTf                        Expert option: True or false, do not use tf weighting, only idf of tf-idf.
obtMhapOptions                     Expert option: free-form parameters to pass to MHAP.
obtMhapOrderedMerSize              K-mer size for second-stage filter in mhap
obtMhapPipe                        Report results to a pipe instead of *large* files.
obtMhapSensitivity                 Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
obtmhapStageSpace                  Amount of local disk space needed to stage data for mhap overlaps for trimming jobs
obtmhapThreads                     Number of threads to use for mhap overlaps for trimming jobs
obtMhapVersion                     Version of the MHAP jar file to use
obtMMapBlockSize                   Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
obtmmapConcurrency                 If grid not enabled, number of mmap overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
obtmmapMemory                      Amount of memory, in gigabytes, to use for mmap overlaps for trimming jobs
obtMMapMerSize                     K-mer size for seeds in minmap
obtmmapStageSpace                  Amount of local disk space needed to stage data for mmap overlaps for trimming jobs
obtmmapThreads                     Number of threads to use for mmap overlaps for trimming jobs
obtOverlapper                      Which overlap algorithm to use for overlap based trimming
obtovlConcurrency                  If grid not enabled, number of overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
obtOvlErrorRate                    Overlaps at or below this error rate are used to trim reads
obtOvlFilter                       Filter overlaps based on expected kmers vs observed kmers
obtOvlFrequentMers                 Do not seed overlaps with these kmers
obtOvlHashBits                     Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per obtOvlHashBlockLength
obtOvlHashBlockLength              Amount of sequence (bp) to load into the overlap hash table
obtOvlHashLoad                     Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
obtovlMemory                       Amount of memory, in gigabytes, to use for overlaps for trimming jobs
obtOvlMerDistinct                  K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
obtOvlMerSize                      K-mer size for seeds in overlaps
obtOvlMerThreshold                 K-mer frequency threshold; mers more frequent than this count are ignored
obtOvlRefBlockLength               Amount of sequence (bp) to search against the hash table per batch
obtovlStageSpace                   Amount of local disk space needed to stage data for overlaps for trimming jobs
obtovlThreads                      Number of threads to use for overlaps for trimming jobs
obtReAlign                         Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses obtOvlErrorRate
oeaBatchLength                     Number of bases per overlap error correction batch
oeaBatchSize                       Number of reads per overlap error correction batch
oeaConcurrency                     If grid not enabled, number of overlap error adjustment jobs to run at the same time; default is n_proc / n_threads
oeaErrorRate                       Only use overlaps with at most this much fraction error to find errors in reads; default utgOvlErrorRate, 0.003 for HiFi reads
oeaHaploConfirm                    This many or more reads will confirm a true haplotype difference; default 5
oeaMaskTrivial                     Mask trivial DNA in Overlap Error Adjustment; default off; on for HiFi reads
oeaMemory                          Amount of memory, in gigabytes, to use for overlap error adjustment jobs
oeaStageSpace                      Amount of local disk space needed to stage data for overlap error adjustment jobs
oeaThreads                         Number of threads to use for overlap error adjustment jobs
onFailure                          Full path to command to run on failure
onSuccess                          Full path to command to run on successful completion
ovbConcurrency                     If grid not enabled, number of overlap store bucketizing jobs to run at the same time; default is n_proc / n_threads
ovbMemory                          Amount of memory, in gigabytes, to use for overlap store bucketizing jobs
ovbStageSpace                      Amount of local disk space needed to stage data for overlap store bucketizing jobs
ovbThreads                         Number of threads to use for overlap store bucketizing jobs
ovsConcurrency                     If grid not enabled, number of overlap store sorting jobs to run at the same time; default is n_proc / n_threads
ovsMemory                          Amount of memory, in gigabytes, to use for overlap store sorting jobs
ovsStageSpace                      Amount of local disk space needed to stage data for overlap store sorting jobs
ovsThreads                         Number of threads to use for overlap store sorting jobs
preExec                            A command line to run at the start of Canu execution scripts
purgeOverlaps                      When to delete intermediate overlap files: never, normal (default), aggressive, dangerous
rawErrorRate                       Expected fraction error in an alignment of two uncorrected reads
readSamplingBias                   Score reads as 'random * length^bias', keep the highest scoring reads
redBatchLength                     Number of bases per fragment error detection batch
redBatchSize                       Number of reads per fragment error detection batch
redConcurrency                     If grid not enabled, number of read error detection jobs to run at the same time; default is n_proc / n_threads
redMemory                          Amount of memory, in gigabytes, to use for read error detection jobs
redStageSpace                      Amount of local disk space needed to stage data for read error detection jobs
redThreads                         Number of threads to use for read error detection jobs
saveMerCounts                      Save full mer counting results, sometimes useful
saveOverlaps                       Do not remove the overlap stores.  Default: false = remove overlap stores when they're no longer needed
saveReadCorrections                Save intermediate read correction files, almost never a good idea
saveReadHaplotypes                 Save intermediate read haplotype files, almost never a good idea
saveReads                          Save intermediate corrected and trimmed reads to asm.correctedReads.fasta.gz and asm.trimmedReads.fasta.gz
shell                              Command interpreter to use; sh-compatible (e.g., bash), NOT C-shell (csh or tcsh); default '/bin/sh'
showNext                           Don't run any commands, just report what would run
stageDirectory                     If set, copy heavily used data to this node-local location
stopAfter                          Stop after a specific algorithm step is completed
stopOnLowCoverage                  Stop if raw, corrected or trimmed read coverage is low
trimReadsCoverage                  Minimum depth of evidence to retain bases; default '2
trimReadsOverlap                   Minimum overlap between evidence to make contiguous trim; default '500'
unitigger                          Which unitig algorithm to use; only 'bogart' supported; default 'bogart'
useGrid                            If 'true', enable grid-based execution; if 'false', run all jobs on the local machine; if 'remote', create jobs for grid execution but do not submit; default 'true'
useGridbat                         If 'true', run module unitig construction under grid control; if 'false' run locally.
useGridcns                         If 'true', run module unitig consensus under grid control; if 'false' run locally.
useGridcor                         If 'true', run module read correction under grid control; if 'false' run locally.
useGridcormhap                     If 'true', run module mhap overlaps for correction under grid control; if 'false' run locally.
useGridcormmap                     If 'true', run module mmap overlaps for correction under grid control; if 'false' run locally.
useGridcorovl                      If 'true', run module overlaps for correction under grid control; if 'false' run locally.
useGridhap                         If 'true', run module haplotype assignment under grid control; if 'false' run locally.
useGridmeryl                       If 'true', run module mer counting under grid control; if 'false' run locally.
useGridobtmhap                     If 'true', run module mhap overlaps for trimming under grid control; if 'false' run locally.
useGridobtmmap                     If 'true', run module mmap overlaps for trimming under grid control; if 'false' run locally.
useGridobtovl                      If 'true', run module overlaps for trimming under grid control; if 'false' run locally.
useGridoea                         If 'true', run module overlap error adjustment under grid control; if 'false' run locally.
useGridovb                         If 'true', run module overlap store bucketizing under grid control; if 'false' run locally.
useGridovs                         If 'true', run module overlap store sorting under grid control; if 'false' run locally.
useGridred                         If 'true', run module read error detection under grid control; if 'false' run locally.
useGridutgmhap                     If 'true', run module mhap overlaps for unitig construction under grid control; if 'false' run locally.
useGridutgmmap                     If 'true', run module mmap overlaps for unitig construction under grid control; if 'false' run locally.
useGridutgovl                      If 'true', run module overlaps for unitig construction under grid control; if 'false' run locally.
utgBubbleDeviation                 Overlaps this much above mean of contig will be used to identify bubbles
utgChimeraType                     When to filter reads for contig construction: none, chimera (missing middle), uncovered (missing middle or ends), deadend (missing middle or end or no neighbor) (default)
utgErrorRate                       Overlaps at or below this error rate are used to construct contigs
utgGraphDeviation                  Overlaps this much above median will not be used for initial graph construction
utgMhapBlockSize                   Number of reads per GB of memory allowed (mhapMemory)
utgmhapConcurrency                 If grid not enabled, number of mhap overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
utgMhapFilterThreshold             Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
utgMhapFilterUnique                Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
utgmhapMemory                      Amount of memory, in gigabytes, to use for mhap overlaps for unitig construction jobs
utgMhapMerSize                     K-mer size for seeds in mhap
utgMhapNoTf                        Expert option: True or false, do not use tf weighting, only idf of tf-idf.
utgMhapOptions                     Expert option: free-form parameters to pass to MHAP.
utgMhapOrderedMerSize              K-mer size for second-stage filter in mhap
utgMhapPipe                        Report results to a pipe instead of *large* files.
utgMhapSensitivity                 Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
utgmhapStageSpace                  Amount of local disk space needed to stage data for mhap overlaps for unitig construction jobs
utgmhapThreads                     Number of threads to use for mhap overlaps for unitig construction jobs
utgMhapVersion                     Version of the MHAP jar file to use
utgMMapBlockSize                   Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
utgmmapConcurrency                 If grid not enabled, number of mmap overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
utgmmapMemory                      Amount of memory, in gigabytes, to use for mmap overlaps for unitig construction jobs
utgMMapMerSize                     K-mer size for seeds in minmap
utgmmapStageSpace                  Amount of local disk space needed to stage data for mmap overlaps for unitig construction jobs
utgmmapThreads                     Number of threads to use for mmap overlaps for unitig construction jobs
utgOverlapper                      Which overlap algorithm to use for unitig construction
utgovlConcurrency                  If grid not enabled, number of overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
utgOvlErrorRate                    Overlaps at or below this error rate are used to trim reads
utgOvlFilter                       Filter overlaps based on expected kmers vs observed kmers
utgOvlFrequentMers                 Do not seed overlaps with these kmers
utgOvlHashBits                     Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per utgOvlHashBlockLength
utgOvlHashBlockLength              Amount of sequence (bp) to load into the overlap hash table
utgOvlHashLoad                     Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
utgovlMemory                       Amount of memory, in gigabytes, to use for overlaps for unitig construction jobs
utgOvlMerDistinct                  K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
utgOvlMerSize                      K-mer size for seeds in overlaps
utgOvlMerThreshold                 K-mer frequency threshold; mers more frequent than this count are ignored
utgOvlRefBlockLength               Amount of sequence (bp) to search against the hash table per batch
utgovlStageSpace                   Amount of local disk space needed to stage data for overlaps for unitig construction jobs
utgovlThreads                      Number of threads to use for overlaps for unitig construction jobs
utgReAlign                         Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses utgOvlErrorRate
utgRepeatConfusedBP                Repeats where the next best edge is at least this many bp shorter will not be split
utgRepeatConfusedPC                Repeats where the next best edge is at least this many percent shorter will not be split
utgRepeatDeviation                 Overlaps this much above mean unitig error rate will not be used for repeat splitting

Back to Top

Installation

Source code obtained from https://github.com/marbl/canu/releases/download/v2.2

System

64-bit Linux