Canu-Sapelo2: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
(updated external documentation link, added clarifying note about Slurm headers, updated Slurm header values)
(Updated Sapelo2 Rocky 8 system only has canu version 2.2. Updated page accordingly.)
(2 intermediate revisions by one other user not shown)
Line 9: Line 9:


=== Version ===
=== Version ===
2.1.1
2.2


=== Author / Distributor ===
=== Author / Distributor ===
Line 21: Line 21:
=== Running Program ===
=== Running Program ===


'''Version 2.1.1'''
'''Version 2.2'''


To use this version, please load the module with
To use this version, please load the module with
<pre class="gscript">
<pre class="gscript">
ml canu/2.1.1-GCCcore-8.3.0-Java-11
ml canu/2.2-GCCcore-11.2.0.lua
</pre>
</pre>
When you invoke canu, please use the gridOptions to pass queueing system options for the jobs the canu pipeline submits. At a minimum, please specify a partition, the number of tasks and the walltime. For example, use '''gridOptions =  --partition=batch --ntasks=1 --cpus-per-task=4 --time=168:00:00 '''. The --mem option will be added automatically by the pipeline scripts.
When you invoke canu, please use the gridOptions to pass queueing system options for the jobs the canu pipeline submits. At a minimum, please specify a partition, the number of tasks and the walltime. For example, use '''gridOptions =  --partition=batch --ntasks=1 --cpus-per-task=4 --time=168:00:00 '''. The --mem-per-cpu option will be added automatically by the pipeline scripts, but you can also add it if the pipeline is not able to estimate the memory needed correctly.




Line 41: Line 41:
cd $SLURM_SUBMIT_DIR
cd $SLURM_SUBMIT_DIR


ml canu/2.1.1-GCCcore-8.3.0-Java-11
ml canu/2.2-GCCcore-11.2.0.lua


canu gridOptions=" --partition=batch --ntasks=1 --cpus-per-task=4 --time=168:00:00 " [options]
canu gridOptions=" --partition=batch --ntasks=1 --cpus-per-task=4 --time=168:00:00 " [options]
Line 56: Line 56:


=== Documentation ===
=== Documentation ===
<pre class="gcommand">
<pre class="gcommand">
[shtsai@b1-24 ~]$ ml canu/2.1.1-GCCcore-8.3.0-Java-11
[cft07037@d2-13 canu]$ ml canu/2.2-GCCcore-11.2.0  
[shtsai@b1-24 ~]$ canu --help
[cft07037@d2-13 canu]$ canu --help


usage:  canu [-version] [-citation] \
usage:  canu [-version] [-citation] \
Line 138: Line 138:
   
   


<pre class="gcommand">
<pre class="gcommand">
[shtsai@b1-24 ~]$ ml canu/2.1.1-GCCcore-8.3.0-Java-11
[cft07037@d2-13 canu]$ ml canu/2.2-GCCcore-11.2.0  
[shtsai@b1-24 ~]$ canu -options
[cft07037@d2-13 canu]$ canu -options
MMapBlockSize                          Number of reads per 1GB; memory * blockSize = the size of block loaded into memory per job
batConcurrency                    Unused, only one process supported
MMapMerSize                            K-mer size for seeds in minmap
batMemory                          Approximate maximum memory usage, in gigabytes, default is the maxMemory limit
MhapBlockSize                          Number of reads per GB of memory allowed (mhapMemory)
batOptions                        Advanced options to bogart
MhapFilterThreshold                     Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
batStageSpace                      Amount of local disk space needed to stage data for unitig construction jobs
MhapFilterUnique                        Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
batThreads                        Number of threads to use; default is the maxThreads limit
MhapMerSize                            K-mer size for seeds in mhap
cnsConcurrency                     If grid not enabled, number of unitig consensus jobs to run at the same time; default is n_proc / n_threads
MhapNoTf                                Expert option: True or false, do not use tf weighting, only idf of tf-idf.
cnsConsensus                      Which consensus algorithm to use; 'pbdagcon' (fast, reliable); 'utgcns' (multialignment output); 'quick' (single read mosaic); default 'pbdagcon'
MhapOptions                            Expert option: free-form parameters to pass to MHAP.
cnsErrorRate                      Consensus expects alignments at about this error rate
MhapOrderedMerSize                     K-mer size for second-stage filter in mhap
cnsMaxCoverage                    Limit unitig consensus to at most this coverage; default '40' = unlimited
MhapSensitivity                         Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
cnsMemory                          Amount of memory, in gigabytes, to use for unitig consensus jobs
MhapVersion                            Version of the MHAP jar file to use
cnsPartitions                      Attempt to create this many consensus jobs; default '0' = based on the largest tig
Overlapper                              Which overlap algorithm to use for unitig construction
cnsStageSpace                     Amount of local disk space needed to stage data for unitig consensus jobs
OvlFilter                              Filter overlaps based on expected kmers vs observed kmers
cnsThreads                         Number of threads to use for unitig consensus jobs
OvlFrequentMers                        Do not seed overlaps with these kmers
contigFilter                      Parameters to filter out 'unassembled' unitigsFive values: minReads minLength singleReadSpan lowCovFraction lowCovDepth
OvlHashBits                            Width of the kmer hash. Width 22=1gb, 23=2gb, 24=4gb, 25=8gb. Plus 10b per utgOvlHashBlockLength
corConcurrency                    If grid not enabled, number of read correction jobs to run at the same time; default is n_proc / n_threads
OvlHashBlockLength                      Amount of sequence (bp) to load into the overlap hash table
corConsensus                      Which consensus algorithm to use; only 'falcon' is supported; default 'falcon'
OvlHashLoad                            Maximum hash table load. If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
corErrorRate                      Only use raw alignments below this error rate to construct corrected reads
OvlMerDistinct                         K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
corFilter                          Method to filter short reads from correction; 'quick' or 'expensive'; default 'expensive'
OvlMerSize                              K-mer size for seeds in overlaps
corMaxEvidenceCoverageGlobal      Limit reads used for correction to supporting at most this coverage; default: '1.0x' = 1.0 * estimated coverage
OvlMerThreshold                        K-mer frequency threshold; mers more frequent than this count are ignored
corMaxEvidenceCoverageLocal        Limit reads being corrected to at most this much evidence coverage; default: '2.0x' = 2.0 * estimated coverage
OvlRefBlockLength                      Amount of sequence (bp) to search against the hash table per batch
corMaxEvidenceErate                Limit read correction to only overlaps at or below this fraction error; default: unlimited
ReAlign                                Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses utgOvlErrorRate
corMemory                         Amount of memory, in gigabytes, to use for read correction jobs
batConcurrency                          Unused, only one process supported
corMhapBlockSize                  Number of reads per GB of memory allowed (mhapMemory)
batMemory                              Approximate maximum memory usage, in gigabytes, default is the maxMemory limit
cormhapConcurrency                If grid not enabled, number of mhap overlaps for correction jobs to run at the same time; default is n_proc / n_threads
batOptions                              Advanced options to bogart
corMhapFilterThreshold            Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
batStageSpace                          Amount of local disk space needed to stage data for unitig construction jobs
corMhapFilterUnique                Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
batThreads                              Number of threads to use; default is the maxThreads limit
cormhapMemory                      Amount of memory, in gigabytes, to use for mhap overlaps for correction jobs
cnsConcurrency                          If grid not enabled, number of unitig consensus jobs to run at the same time; default is n_proc / n_threads
corMhapMerSize                    K-mer size for seeds in mhap
cnsConsensus                            Which consensus algorithm to use; 'pbdagcon' (fast, reliable); 'utgcns' (multialignment output); 'quick' (single read mosaic); default 'pbdagcon'
corMhapNoTf                        Expert option: True or false, do not use tf weighting, only idf of tf-idf.
cnsErrorRate                            Consensus expects alignments at about this error rate
corMhapOptions                    Expert option: free-form parameters to pass to MHAP.
cnsMaxCoverage                          Limit unitig consensus to at most this coverage; default '40' = unlimited
corMhapOrderedMerSize              K-mer size for second-stage filter in mhap
cnsMemory                              Amount of memory, in gigabytes, to use for unitig consensus jobs
corMhapPipe                        Report results to a pipe instead of *large* files.
cnsPartitions                          Attempt to create this many consensus jobs; default '0' = based on the largest tig
corMhapSensitivity                Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
cnsStageSpace                          Amount of local disk space needed to stage data for unitig consensus jobs
cormhapStageSpace                  Amount of local disk space needed to stage data for mhap overlaps for correction jobs
cnsThreads                              Number of threads to use for unitig consensus jobs
cormhapThreads                    Number of threads to use for mhap overlaps for correction jobs
contigFilter                            Parameters to filter out 'unassembled' unitigs. Five values: minReads minLength singleReadSpan lowCovFraction lowCovDepth
corMhapVersion                    Version of the MHAP jar file to use
corConcurrency                          If grid not enabled, number of read correction jobs to run at the same time; default is n_proc / n_threads
corMinCoverage                    Minimum number of bases supporting each corrected base, if less than this sequences are split; default based on input read coverage: 0 <= 30x < 4 < 60x <= 4
corConsensus                            Which consensus algorithm to use; only 'falcon' is supported; default 'falcon'
corMinEvidenceLength              Limit read correction to only overlaps longer than this; default: unlimited
corErrorRate                            Only use raw alignments below this error rate to construct corrected reads
corMMapBlockSize                  Number of reads per 1GB; memory * blockSize = the size of block loaded into memory per job
corFilter                              Method to filter short reads from correction; 'quick' or 'expensive'; default 'expensive'
cormmapConcurrency                If grid not enabled, number of mmap overlaps for correction jobs to run at the same time; default is n_proc / n_threads
corMMapBlockSize                        Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
cormmapMemory                      Amount of memory, in gigabytes, to use for mmap overlaps for correction jobs
corMMapMerSize                          K-mer size for seeds in minmap
corMMapMerSize                    K-mer size for seeds in minmap
corMaxEvidenceCoverageGlobal            Limit reads used for correction to supporting at most this coverage; default: '1.0x' = 1.0 * estimated coverage
cormmapStageSpace                  Amount of local disk space needed to stage data for mmap overlaps for correction jobs
corMaxEvidenceCoverageLocal            Limit reads being corrected to at most this much evidence coverage; default: '2.0x' = 2.0 * estimated coverage
cormmapThreads                    Number of threads to use for mmap overlaps for correction jobs
corMaxEvidenceErate                    Limit read correction to only overlaps at or below this fraction error; default: unlimited
corOutCoverage                    Only correct the longest reads up to this coverage; default 40
corMemory                              Amount of memory, in gigabytes, to use for read correction jobs
corOverlapper                      Which overlap algorithm to use for correction
corMhapBlockSize                        Number of reads per GB of memory allowed (mhapMemory)
corovlConcurrency                  If grid not enabled, number of overlaps for correction jobs to run at the same time; default is n_proc / n_threads
corMhapFilterThreshold                  Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
corOvlErrorRate                    Overlaps above this error rate are not computed
corMhapFilterUnique                    Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
corOvlFilter                      Filter overlaps based on expected kmers vs observed kmers
corMhapMerSize                          K-mer size for seeds in mhap
corOvlFrequentMers                Do not seed overlaps with these kmers
corMhapNoTf                            Expert option: True or false, do not use tf weighting, only idf of tf-idf.
corOvlHashBits                    Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per corOvlHashBlockLength
corMhapOptions                          Expert option: free-form parameters to pass to MHAP.
corOvlHashBlockLength              Amount of sequence (bp) to load into the overlap hash table
corMhapOrderedMerSize                   K-mer size for second-stage filter in mhap
corOvlHashLoad                    Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
corMhapSensitivity                     Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
corovlMemory                      Amount of memory, in gigabytes, to use for overlaps for correction jobs
corMhapVersion                          Version of the MHAP jar file to use
corOvlMerDistinct                  K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
corMinCoverage                          Minimum number of bases supporting each corrected base, if less than this sequences are split; default based on input read coverage: 0 <= 30x < 4 < 60x <= 4
corOvlMerSize                      K-mer size for seeds in overlaps
corMinEvidenceLength                    Limit read correction to only overlaps longer than this; default: unlimited
corOvlMerThreshold                K-mer frequency threshold; mers more frequent than this count are ignored
corOutCoverage                         Only correct the longest reads up to this coverage; default 40
corOvlRefBlockLength              Amount of sequence (bp) to search against the hash table per batch
corOverlapper                          Which overlap algorithm to use for correction
corovlStageSpace                   Amount of local disk space needed to stage data for overlaps for correction jobs
corOvlErrorRate                        Overlaps above this error rate are not computed
corovlThreads                     Number of threads to use for overlaps for correction jobs
corOvlFilter                            Filter overlaps based on expected kmers vs observed kmers
corPartitionMin                    Don't make a read correction partition with fewer than N reads
corOvlFrequentMers                      Do not seed overlaps with these kmers
corPartitions                      Partition read correction into N jobs
corOvlHashBits                          Width of the kmer hashWidth 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per corOvlHashBlockLength
corReAlign                        Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses corOvlErrorRate
corOvlHashBlockLength                  Amount of sequence (bp) to load into the overlap hash table
correctedErrorRate                Expected fraction error in an alignment of two corrected reads
corOvlHashLoad                          Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
corStageSpace                      Amount of local disk space needed to stage data for read correction jobs
corOvlMerDistinct                      K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
corThreads                        Number of threads to use for read correction jobs
corOvlMerSize                          K-mer size for seeds in overlaps
enableOEA                         Do overlap error adjustment - comprises two steps: read error detection (RED) and overlap error adjustment (OEA); default 'true'
corOvlMerThreshold                      K-mer frequency threshold; mers more frequent than this count are ignored
executiveMemory                    Amount of memory, in GB, to reserve for the Canu exective process
corOvlRefBlockLength                    Amount of sequence (bp) to search against the hash table per batch
executiveThreads                  Number of threads to reserve for the Canu exective process
corPartitionMin                        Don't make a read correction partition with fewer than N reads
genomeSize                        An estimate of the size of the genome
corPartitions                          Partition read correction into N jobs
gnuplot                            Path to the gnuplot executable
corReAlign                              Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses corOvlErrorRate
gnuplotImageFormat                Image format that gnuplot will generateDefault: based on gnuplot, 'png', 'svg' or 'gif'
corStageSpace                          Amount of local disk space needed to stage data for read correction jobs
gridEngine                        Grid engine configuration, not documented
corThreads                              Number of threads to use for read correction jobs
gridEngineArrayMaxJobs            Grid engine configuration, not documented
cormhapConcurrency                      If grid not enabled, number of mhap overlaps for correction jobs to run at the same time; default is n_proc / n_threads
gridEngineArrayName                Grid engine configuration, not documented
cormhapMemory                          Amount of memory, in gigabytes, to use for mhap overlaps for correction jobs
gridEngineArrayOption              Grid engine configuration, not documented
cormhapStageSpace                      Amount of local disk space needed to stage data for mhap overlaps for correction jobs
gridEngineArraySubmitID            Grid engine configuration, not documented
cormhapThreads                          Number of threads to use for mhap overlaps for correction jobs
gridEngineJobID                    Grid engine configuration, not documented
cormmapConcurrency                      If grid not enabled, number of mmap overlaps for correction jobs to run at the same time; default is n_proc / n_threads
gridEngineMemoryOption            Grid engine configuration, not documented
cormmapMemory                          Amount of memory, in gigabytes, to use for mmap overlaps for correction jobs
gridEngineMemoryPerJob            Grid engine configuration, not documented
cormmapStageSpace                      Amount of local disk space needed to stage data for mmap overlaps for correction jobs
gridEngineMemoryUnits              Grid engine configuration, not documented
cormmapThreads                          Number of threads to use for mmap overlaps for correction jobs
gridEngineNameOption              Grid engine configuration, not documented
corovlConcurrency                      If grid not enabled, number of overlaps for correction jobs to run at the same time; default is n_proc / n_threads
gridEngineNameToJobIDCommand      Grid engine configuration, not documented
corovlMemory                            Amount of memory, in gigabytes, to use for overlaps for correction jobs
gridEngineNameToJobIDCommandNoArrayGrid engine configuration, not documented
corovlStageSpace                        Amount of local disk space needed to stage data for overlaps for correction jobs
gridEngineOutputOption            Grid engine configuration, not documented
corovlThreads                          Number of threads to use for overlaps for correction jobs
gridEngineResourceOption          Grid engine configuration, not documented
correctedErrorRate                      Expected fraction error in an alignment of two corrected reads
gridEngineStageOption              Grid engine configuration, not documented
enableOEA                              Do overlap error adjustment - comprises two steps: read error detection (RED) and overlap error adjustment (OEA); default 'true'
gridEngineSubmitCommand            Grid engine configuration, not documented
executiveMemory                        Amount of memory, in GB, to reserve for the Canu exective process
gridEngineTaskID                  Grid engine configuration, not documented
executiveThreads                        Number of threads to reserve for the Canu exective process
gridEngineThreadsOption            Grid engine configuration, not documented
genomeSize                              An estimate of the size of the genome
gridOptions                        Grid engine options applied to all jobs
gnuplot                                Path to the gnuplot executable
gridOptionsbat                    Grid engine options applied to unitig construction jobs
gnuplotImageFormat                      Image format that gnuplot will generate.  Default: based on gnuplot, 'png', 'svg' or 'gif'
gridOptionscns                    Grid engine options applied to unitig consensus jobs
gridEngine                              Grid engine configuration, not documented
gridOptionscor                    Grid engine options applied to read correction jobs
gridEngineArrayMaxJobs                  Grid engine configuration, not documented
gridOptionscormhap                Grid engine options applied to mhap overlaps for correction jobs
gridEngineArrayName                    Grid engine configuration, not documented
gridOptionscormmap                Grid engine options applied to mmap overlaps for correction jobs
gridEngineArrayOption                   Grid engine configuration, not documented
gridOptionscorovl                  Grid engine options applied to overlaps for correction jobs
gridEngineArraySubmitID                Grid engine configuration, not documented
gridOptionsExecutive              Grid engine options applied to the canu executive script
gridEngineJobID                        Grid engine configuration, not documented
gridOptionshap                    Grid engine options applied to haplotype assignment jobs
gridEngineMemoryOption                  Grid engine configuration, not documented
gridOptionsJobName                Grid jobs job-name suffix
gridEngineMemoryPerJob                  Grid engine configuration, not documented
gridOptionsmeryl                  Grid engine options applied to mer counting jobs
gridEngineMemoryUnits                  Grid engine configuration, not documented
gridOptionsobtmhap                Grid engine options applied to mhap overlaps for trimming jobs
gridEngineNameOption                    Grid engine configuration, not documented
gridOptionsobtmmap                Grid engine options applied to mmap overlaps for trimming jobs
gridEngineNameToJobIDCommand            Grid engine configuration, not documented
gridOptionsobtovl                  Grid engine options applied to overlaps for trimming jobs
gridEngineNameToJobIDCommandNoArray    Grid engine configuration, not documented
gridOptionsoea                    Grid engine options applied to overlap error adjustment jobs
gridEngineOutputOption                  Grid engine configuration, not documented
gridOptionsovb                    Grid engine options applied to overlap store bucketizing jobs
gridEngineResourceOption                Grid engine configuration, not documented
gridOptionsovs                    Grid engine options applied to overlap store sorting jobs
gridEngineStageOption                   Grid engine configuration, not documented
gridOptionsred                    Grid engine options applied to read error detection jobs
gridEngineSubmitCommand                Grid engine configuration, not documented
gridOptionsutgmhap                Grid engine options applied to mhap overlaps for unitig construction jobs
gridEngineTaskID                        Grid engine configuration, not documented
gridOptionsutgmmap                Grid engine options applied to mmap overlaps for unitig construction jobs
gridEngineThreadsOption                Grid engine configuration, not documented
gridOptionsutgovl                  Grid engine options applied to overlaps for unitig construction jobs
gridOptions                            Grid engine options applied to all jobs
hapConcurrency                    Unused, there is only one process
gridOptionsExecutive                    Grid engine options applied to the canu executive script
hapMemory                          Amount of memory, in gigabytes, to use for haplotype assignment
gridOptionsJobName                      Grid jobs job-name suffix
hapStageSpace                      Amount of local disk space needed to stage data for haplotype assignment jobs
gridOptionsbat                          Grid engine options applied to unitig construction jobs
hapThreads                        Number of threads to use for haplotype assignment
gridOptionscns                          Grid engine options applied to unitig consensus jobs
hapUnknownFraction                Fraction of allowed unknown bases before they are included in the assembly, between 0-1; default 0.05
gridOptionscor                          Grid engine options applied to read correction jobs
homoPolyCompress                   Compute everything but consensus sequences using homopolymer compressed reads
gridOptionscormhap                      Grid engine options applied to mhap overlaps for correction jobs
java                              Java interpreter to use; at least version 1.8; default 'java'
gridOptionscormmap                      Grid engine options applied to mmap overlaps for correction jobs
javaUse64Bit                      Java interpreter supports the -d64 or -d32 flags; default auto
gridOptionscorovl                       Grid engine options applied to overlaps for correction jobs
maxInputCoverage                  If input coverage is high, downsample to something reasonable; default 200
gridOptionshap                          Grid engine options applied to haplotype assignment jobs
maxMemory                          Maximum memory to use by any component of the assembler
gridOptionsmeryl                        Grid engine options applied to mer counting jobs
maxThreads                        Maximum number of compute threads to use by any component of the assembler
gridOptionsobtmhap                      Grid engine options applied to mhap overlaps for trimming jobs
merylConcurrency                  Unused, there is only one process
gridOptionsobtmmap                      Grid engine options applied to mmap overlaps for trimming jobs
merylMemory                        Amount of memory, in gigabytes, to use for mer counting
gridOptionsobtovl                      Grid engine options applied to overlaps for trimming jobs
merylStageSpace                    Amount of local disk space needed to stage data for mer counting jobs
gridOptionsoea                          Grid engine options applied to overlap error adjustment jobs
merylThreads                      Number of threads to use for mer counting
gridOptionsovb                          Grid engine options applied to overlap store bucketizing jobs
minimap                            Path to minimap2; default 'minimap2'
gridOptionsovs                          Grid engine options applied to overlap store sorting jobs
minInputCoverage                   Stop if input coverage is too low; default 10
gridOptionsred                          Grid engine options applied to read error detection jobs
minMemory                          Minimum amount of memory needed to compute the assembly (do not set unless prompted!)
gridOptionsutgmhap                      Grid engine options applied to mhap overlaps for unitig construction jobs
minOverlapLength                  Overlaps shorter than this length are not computed; default 500
gridOptionsutgmmap                      Grid engine options applied to mmap overlaps for unitig construction jobs
minReadLength                      Reads shorter than this length are not loaded into the assembler; default 1000
gridOptionsutgovl                      Grid engine options applied to overlaps for unitig construction jobs
minThreads                        Minimum number of compute threads suggested to compute the assembly
hapConcurrency                          Unused, there is only one process
objectStore                        Type of object storage used; not ready for production yet
hapMemory                              Amount of memory, in gigabytes, to use for haplotype assignment
objectStoreClient                  Path to the command line client used to access the object storage
hapStageSpace                          Amount of local disk space needed to stage data for haplotype assignment jobs
objectStoreClientDA                Path to the command line client used to download files from object storage
hapThreads                              Number of threads to use for haplotype assignment
objectStoreClientUA                Path to the command line client used to upload files to object storage
hapUnknownFraction                      Fraction of allowed unknown bases before they are included in the assembly, between 0-1; default 0.05
objectStoreNameSpace              Object store parameters; specific to the type of objectStore used
homoPolyCompress                        Compute everything but consensus sequences using homopolymer compressed reads
objectStoreProject                Object store project; specific to the type of objectStore used
java                                    Java interpreter to use; at least version 1.8; default 'java'
obtErrorRate                       Stringency of overlaps to use for trimming
javaUse64Bit                            Java interpreter supports the -d64 or -d32 flags; default auto
obtMhapBlockSize                  Number of reads per GB of memory allowed (mhapMemory)
maxInputCoverage                        If input coverage is high, downsample to something reasonable; default 200
obtmhapConcurrency                If grid not enabled, number of mhap overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
maxMemory                              Maximum memory to use by any component of the assembler
obtMhapFilterThreshold            Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
maxThreads                              Maximum number of compute threads to use by any component of the assembler
obtMhapFilterUnique                Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
merylConcurrency                        Unused, there is only one process
obtmhapMemory                      Amount of memory, in gigabytes, to use for mhap overlaps for trimming jobs
merylMemory                            Amount of memory, in gigabytes, to use for mer counting
obtMhapMerSize                    K-mer size for seeds in mhap
merylStageSpace                        Amount of local disk space needed to stage data for mer counting jobs
obtMhapNoTf                        Expert option: True or false, do not use tf weighting, only idf of tf-idf.
merylThreads                            Number of threads to use for mer counting
obtMhapOptions                    Expert option: free-form parameters to pass to MHAP.
minInputCoverage                        Stop if input coverage is too low; default 10
obtMhapOrderedMerSize              K-mer size for second-stage filter in mhap
minMemory                              Minimum amount of memory needed to compute the assembly (do not set unless prompted!)
obtMhapPipe                        Report results to a pipe instead of *large* files.
minOverlapLength                        Overlaps shorter than this length are not computed; default 500
obtMhapSensitivity                Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
minReadLength                          Reads shorter than this length are not loaded into the assembler; default 1000
obtmhapStageSpace                  Amount of local disk space needed to stage data for mhap overlaps for trimming jobs
minThreads                              Minimum number of compute threads suggested to compute the assembly
obtmhapThreads                    Number of threads to use for mhap overlaps for trimming jobs
minimap                                Path to minimap2; default 'minimap2'
obtMhapVersion                    Version of the MHAP jar file to use
objectStore                            Type of object storage used; not ready for production yet
obtMMapBlockSize                  Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
objectStoreClient                      Path to the command line client used to access the object storage
obtmmapConcurrency                If grid not enabled, number of mmap overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
objectStoreClientDA                    Path to the command line client used to download files from object storage
obtmmapMemory                      Amount of memory, in gigabytes, to use for mmap overlaps for trimming jobs
objectStoreClientUA                    Path to the command line client used to upload files to object storage
obtMMapMerSize                    K-mer size for seeds in minmap
objectStoreNameSpace                    Object store parameters; specific to the type of objectStore used
obtmmapStageSpace                  Amount of local disk space needed to stage data for mmap overlaps for trimming jobs
objectStoreProject                      Object store project; specific to the type of objectStore used
obtmmapThreads                    Number of threads to use for mmap overlaps for trimming jobs
obtErrorRate                            Stringency of overlaps to use for trimming
obtOverlapper                      Which overlap algorithm to use for overlap based trimming
obtMMapBlockSize                        Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
obtovlConcurrency                  If grid not enabled, number of overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
obtMMapMerSize                          K-mer size for seeds in minmap
obtOvlErrorRate                    Overlaps at or below this error rate are used to trim reads
obtMhapBlockSize                        Number of reads per GB of memory allowed (mhapMemory)
obtOvlFilter                      Filter overlaps based on expected kmers vs observed kmers
obtMhapFilterThreshold                  Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
obtOvlFrequentMers                Do not seed overlaps with these kmers
obtMhapFilterUnique                     Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
obtOvlHashBits                    Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per obtOvlHashBlockLength
obtMhapMerSize                          K-mer size for seeds in mhap
obtOvlHashBlockLength              Amount of sequence (bp) to load into the overlap hash table
obtMhapNoTf                            Expert option: True or false, do not use tf weighting, only idf of tf-idf.
obtOvlHashLoad                    Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
obtMhapOptions                         Expert option: free-form parameters to pass to MHAP.
obtovlMemory                      Amount of memory, in gigabytes, to use for overlaps for trimming jobs
obtMhapOrderedMerSize                  K-mer size for second-stage filter in mhap
obtOvlMerDistinct                  K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
obtMhapSensitivity                      Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
obtOvlMerSize                      K-mer size for seeds in overlaps
obtMhapVersion                         Version of the MHAP jar file to use
obtOvlMerThreshold                K-mer frequency threshold; mers more frequent than this count are ignored
obtOverlapper                          Which overlap algorithm to use for overlap based trimming
obtOvlRefBlockLength              Amount of sequence (bp) to search against the hash table per batch
obtOvlErrorRate                        Overlaps at or below this error rate are used to trim reads
obtovlStageSpace                  Amount of local disk space needed to stage data for overlaps for trimming jobs
obtOvlFilter                            Filter overlaps based on expected kmers vs observed kmers
obtovlThreads                      Number of threads to use for overlaps for trimming jobs
obtOvlFrequentMers                     Do not seed overlaps with these kmers
obtReAlign                        Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses obtOvlErrorRate
obtOvlHashBits                          Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per obtOvlHashBlockLength
oeaBatchLength                    Number of bases per overlap error correction batch
obtOvlHashBlockLength                  Amount of sequence (bp) to load into the overlap hash table
oeaBatchSize                      Number of reads per overlap error correction batch
obtOvlHashLoad                          Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
oeaConcurrency                    If grid not enabled, number of overlap error adjustment jobs to run at the same time; default is n_proc / n_threads
obtOvlMerDistinct                       K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
oeaErrorRate                      Only use overlaps with at most this much fraction error to find errors in reads; default utgOvlErrorRate, 0.003 for HiFi reads
obtOvlMerSize                          K-mer size for seeds in overlaps
oeaHaploConfirm                    This many or more reads will confirm a true haplotype difference; default 5
obtOvlMerThreshold                      K-mer frequency threshold; mers more frequent than this count are ignored
oeaMaskTrivial                     Mask trivial DNA in Overlap Error Adjustment; default off; on for HiFi reads
obtOvlRefBlockLength                    Amount of sequence (bp) to search against the hash table per batch
oeaMemory                          Amount of memory, in gigabytes, to use for overlap error adjustment jobs
obtReAlign                              Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses obtOvlErrorRate
oeaStageSpace                      Amount of local disk space needed to stage data for overlap error adjustment jobs
obtmhapConcurrency                     If grid not enabled, number of mhap overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
oeaThreads                        Number of threads to use for overlap error adjustment jobs
obtmhapMemory                          Amount of memory, in gigabytes, to use for mhap overlaps for trimming jobs
onFailure                         Full path to command to run on failure
obtmhapStageSpace                       Amount of local disk space needed to stage data for mhap overlaps for trimming jobs
onSuccess                          Full path to command to run on successful completion
obtmhapThreads                          Number of threads to use for mhap overlaps for trimming jobs
ovbConcurrency                    If grid not enabled, number of overlap store bucketizing jobs to run at the same time; default is n_proc / n_threads
obtmmapConcurrency                      If grid not enabled, number of mmap overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
ovbMemory                         Amount of memory, in gigabytes, to use for overlap store bucketizing jobs
obtmmapMemory                          Amount of memory, in gigabytes, to use for mmap overlaps for trimming jobs
ovbStageSpace                      Amount of local disk space needed to stage data for overlap store bucketizing jobs
obtmmapStageSpace                      Amount of local disk space needed to stage data for mmap overlaps for trimming jobs
ovbThreads                        Number of threads to use for overlap store bucketizing jobs
obtmmapThreads                          Number of threads to use for mmap overlaps for trimming jobs
ovsConcurrency                    If grid not enabled, number of overlap store sorting jobs to run at the same time; default is n_proc / n_threads
obtovlConcurrency                      If grid not enabled, number of overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
ovsMemory                          Amount of memory, in gigabytes, to use for overlap store sorting jobs
obtovlMemory                            Amount of memory, in gigabytes, to use for overlaps for trimming jobs
ovsStageSpace                     Amount of local disk space needed to stage data for overlap store sorting jobs
obtovlStageSpace                        Amount of local disk space needed to stage data for overlaps for trimming jobs
ovsThreads                        Number of threads to use for overlap store sorting jobs
obtovlThreads                          Number of threads to use for overlaps for trimming jobs
preExec                            A command line to run at the start of Canu execution scripts
oeaBatchLength                         Number of bases per overlap error correction batch
purgeOverlaps                      When to delete intermediate overlap files: never, normal (default), aggressive, dangerous
oeaBatchSize                           Number of reads per overlap error correction batch
rawErrorRate                      Expected fraction error in an alignment of two uncorrected reads
oeaConcurrency                          If grid not enabled, number of overlap error adjustment jobs to run at the same time; default is n_proc / n_threads
readSamplingBias                  Score reads as 'random * length^bias', keep the highest scoring reads
oeaMemory                              Amount of memory, in gigabytes, to use for overlap error adjustment jobs
redBatchLength                    Number of bases per fragment error detection batch
oeaStageSpace                          Amount of local disk space needed to stage data for overlap error adjustment jobs
redBatchSize                       Number of reads per fragment error detection batch
oeaThreads                              Number of threads to use for overlap error adjustment jobs
redConcurrency                    If grid not enabled, number of read error detection jobs to run at the same time; default is n_proc / n_threads
onFailure                              Full path to command to run on failure
redMemory                          Amount of memory, in gigabytes, to use for read error detection jobs
onSuccess                              Full path to command to run on successful completion
redStageSpace                     Amount of local disk space needed to stage data for read error detection jobs
ovbConcurrency                          If grid not enabled, number of overlap store bucketizing jobs to run at the same time; default is n_proc / n_threads
redThreads                        Number of threads to use for read error detection jobs
ovbMemory                              Amount of memory, in gigabytes, to use for overlap store bucketizing jobs
saveMerCounts                      Save full mer counting results, sometimes useful
ovbStageSpace                          Amount of local disk space needed to stage data for overlap store bucketizing jobs
saveOverlaps                       Do not remove the overlap stores.  Default: false = remove overlap stores when they're no longer needed
ovbThreads                              Number of threads to use for overlap store bucketizing jobs
saveReadCorrections                Save intermediate read correction files, almost never a good idea
ovsConcurrency                          If grid not enabled, number of overlap store sorting jobs to run at the same time; default is n_proc / n_threads
saveReadHaplotypes                Save intermediate read haplotype files, almost never a good idea
ovsMemory                              Amount of memory, in gigabytes, to use for overlap store sorting jobs
saveReads                          Save intermediate corrected and trimmed reads to asm.correctedReads.fasta.gz and asm.trimmedReads.fasta.gz
ovsStageSpace                          Amount of local disk space needed to stage data for overlap store sorting jobs
shell                              Command interpreter to use; sh-compatible (e.g., bash), NOT C-shell (csh or tcsh); default '/bin/sh'
ovsThreads                              Number of threads to use for overlap store sorting jobs
showNext                          Don't run any commands, just report what would run
preExec                                A command line to run at the start of Canu execution scripts
stageDirectory                    If set, copy heavily used data to this node-local location
purgeOverlaps                          When to delete intermediate overlap files: never, normal (default), aggressive, dangerous
stopAfter                          Stop after a specific algorithm step is completed
rawErrorRate                            Expected fraction error in an alignment of two uncorrected reads
stopOnLowCoverage                  Stop if raw, corrected or trimmed read coverage is low
readSamplingBias                        Score reads as 'random * length^bias', keep the highest scoring reads
trimReadsCoverage                  Minimum depth of evidence to retain bases; default '2
readSamplingCoverage                    DEPRECATED; use maxInputCoverage. Discard reads to make the input be of this size
trimReadsOverlap                  Minimum overlap between evidence to make contiguous trim; default '500'
redBatchLength                          Number of bases per fragment error detection batch
unitigger                         Which unitig algorithm to use; only 'bogart' supported; default 'bogart'
redBatchSize                            Number of reads per fragment error detection batch
useGrid                           If 'true', enable grid-based execution; if 'false', run all jobs on the local machine; if 'remote', create jobs for grid execution but do not submit; default 'true'
redConcurrency                          If grid not enabled, number of read error detection jobs to run at the same time; default is n_proc / n_threads
useGridbat                        If 'true', run module unitig construction under grid control; if 'false' run locally.
redMemory                              Amount of memory, in gigabytes, to use for read error detection jobs
useGridcns                        If 'true', run module unitig consensus under grid control; if 'false' run locally.
redStageSpace                          Amount of local disk space needed to stage data for read error detection jobs
useGridcor                        If 'true', run module read correction under grid control; if 'false' run locally.
redThreads                              Number of threads to use for read error detection jobs
useGridcormhap                    If 'true', run module mhap overlaps for correction under grid control; if 'false' run locally.
saveMerCounts                          Save full mer counting results, sometimes useful
useGridcormmap                    If 'true', run module mmap overlaps for correction under grid control; if 'false' run locally.
saveOverlaps                            Do not remove the overlap stores.  Default: false = remove overlap stores when they're no longer needed
useGridcorovl                      If 'true', run module overlaps for correction under grid control; if 'false' run locally.
saveReadCorrections                    Save intermediate read correction files, almost never a good idea
useGridhap                        If 'true', run module haplotype assignment under grid control; if 'false' run locally.
saveReadHaplotypes                      Save intermediate read haplotype files, almost never a good idea
useGridmeryl                      If 'true', run module mer counting under grid control; if 'false' run locally.
saveReads                              Save intermediate corrected and trimmed reads to asm.correctedReads.fasta.gz and asm.trimmedReads.fasta.gz
useGridobtmhap                    If 'true', run module mhap overlaps for trimming under grid control; if 'false' run locally.
shell                                  Command interpreter to use; sh-compatible (e.g., bash), NOT C-shell (csh or tcsh); default '/bin/sh'
useGridobtmmap                    If 'true', run module mmap overlaps for trimming under grid control; if 'false' run locally.
showNext                                Don't run any commands, just report what would run
useGridobtovl                      If 'true', run module overlaps for trimming under grid control; if 'false' run locally.
stageDirectory                          If set, copy heavily used data to this node-local location
useGridoea                        If 'true', run module overlap error adjustment under grid control; if 'false' run locally.
stopAfter                              Stop after a specific algorithm step is completed
useGridovb                        If 'true', run module overlap store bucketizing under grid control; if 'false' run locally.
stopOnLowCoverage                      Stop if raw, corrected or trimmed read coverage is low
useGridovs                        If 'true', run module overlap store sorting under grid control; if 'false' run locally.
trimReadsCoverage                      Minimum depth of evidence to retain bases; default '2
useGridred                        If 'true', run module read error detection under grid control; if 'false' run locally.
trimReadsOverlap                        Minimum overlap between evidence to make contiguous trim; default '500'
useGridutgmhap                    If 'true', run module mhap overlaps for unitig construction under grid control; if 'false' run locally.
unitigger                              Which unitig algorithm to use; only 'bogart' supported; default 'bogart'
useGridutgmmap                    If 'true', run module mmap overlaps for unitig construction under grid control; if 'false' run locally.
useGrid                                If 'true', enable grid-based execution; if 'false', run all jobs on the local machine; if 'remote', create jobs for grid execution but do not submit; default 'true'
useGridutgovl                      If 'true', run module overlaps for unitig construction under grid control; if 'false' run locally.
useGridBAT                              If 'true', run module BAT under grid control; if 'false' run locally.
utgBubbleDeviation                Overlaps this much above mean of contig will be used to identify bubbles
useGridCNS                              If 'true', run module CNS under grid control; if 'false' run locally.
utgChimeraType                    When to filter reads for contig construction: none, chimera (missing middle), uncovered (missing middle or ends), deadend (missing middle or end or no neighbor) (default)
useGridCOR                              If 'true', run module COR under grid control; if 'false' run locally.
utgErrorRate                      Overlaps at or below this error rate are used to construct contigs
useGridCORMHAP                          If 'true', run module CORMHAP under grid control; if 'false' run locally.
utgGraphDeviation                  Overlaps this much above median will not be used for initial graph construction
useGridCORMMAP                          If 'true', run module CORMMAP under grid control; if 'false' run locally.
utgMhapBlockSize                  Number of reads per GB of memory allowed (mhapMemory)
useGridCOROVL                          If 'true', run module COROVL under grid control; if 'false' run locally.
utgmhapConcurrency                If grid not enabled, number of mhap overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
useGridHAP                              If 'true', run module HAP under grid control; if 'false' run locally.
utgMhapFilterThreshold            Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
useGridMERYL                            If 'true', run module MERYL under grid control; if 'false' run locally.
utgMhapFilterUnique                Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
useGridOBTMHAP                          If 'true', run module OBTMHAP under grid control; if 'false' run locally.
utgmhapMemory                      Amount of memory, in gigabytes, to use for mhap overlaps for unitig construction jobs
useGridOBTMMAP                          If 'true', run module OBTMMAP under grid control; if 'false' run locally.
utgMhapMerSize                    K-mer size for seeds in mhap
useGridOBTOVL                          If 'true', run module OBTOVL under grid control; if 'false' run locally.
utgMhapNoTf                        Expert option: True or false, do not use tf weighting, only idf of tf-idf.
useGridOEA                              If 'true', run module OEA under grid control; if 'false' run locally.
utgMhapOptions                    Expert option: free-form parameters to pass to MHAP.
useGridOVB                              If 'true', run module OVB under grid control; if 'false' run locally.
utgMhapOrderedMerSize              K-mer size for second-stage filter in mhap
useGridOVS                              If 'true', run module OVS under grid control; if 'false' run locally.
utgMhapPipe                        Report results to a pipe instead of *large* files.
useGridRED                              If 'true', run module RED under grid control; if 'false' run locally.
utgMhapSensitivity                Coarse sensitivity level: 'low', 'normal' or 'high'. Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
useGridUTGMHAP                          If 'true', run module UTGMHAP under grid control; if 'false' run locally.
utgmhapStageSpace                  Amount of local disk space needed to stage data for mhap overlaps for unitig construction jobs
useGridUTGMMAP                          If 'true', run module UTGMMAP under grid control; if 'false' run locally.
utgmhapThreads                    Number of threads to use for mhap overlaps for unitig construction jobs
useGridUTGOVL                          If 'true', run module UTGOVL under grid control; if 'false' run locally.
utgMhapVersion                    Version of the MHAP jar file to use
utgBubbleDeviation                      Overlaps this much above mean of contig will be used to identify bubbles
utgMMapBlockSize                  Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
utgChimeraType                          When to filter reads for contig construction: none, chimera (missing middle), uncovered (missing middle or ends), deadend (missing middle or end or no neighbor) (default)
utgmmapConcurrency                If grid not enabled, number of mmap overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
utgErrorRate                            Overlaps at or below this error rate are used to construct contigs
utgmmapMemory                      Amount of memory, in gigabytes, to use for mmap overlaps for unitig construction jobs
utgGraphDeviation                       Overlaps this much above median will not be used for initial graph construction
utgMMapMerSize                    K-mer size for seeds in minmap
utgMMapBlockSize                        Number of reads per 1GB; memory * blockSize = the size of block loaded into memory per job
utgmmapStageSpace                  Amount of local disk space needed to stage data for mmap overlaps for unitig construction jobs
utgMMapMerSize                          K-mer size for seeds in minmap
utgmmapThreads                    Number of threads to use for mmap overlaps for unitig construction jobs
utgMhapBlockSize                        Number of reads per GB of memory allowed (mhapMemory)
utgOverlapper                      Which overlap algorithm to use for unitig construction
utgMhapFilterThreshold                  Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
utgovlConcurrency                  If grid not enabled, number of overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
utgMhapFilterUnique                    Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
utgOvlErrorRate                    Overlaps at or below this error rate are used to trim reads
utgMhapMerSize                          K-mer size for seeds in mhap
utgOvlFilter                      Filter overlaps based on expected kmers vs observed kmers
utgMhapNoTf                            Expert option: True or false, do not use tf weighting, only idf of tf-idf.
utgOvlFrequentMers                Do not seed overlaps with these kmers
utgMhapOptions                          Expert option: free-form parameters to pass to MHAP.
utgOvlHashBits                    Width of the kmer hash. Width 22=1gb, 23=2gb, 24=4gb, 25=8gb. Plus 10b per utgOvlHashBlockLength
utgMhapOrderedMerSize                  K-mer size for second-stage filter in mhap
utgOvlHashBlockLength              Amount of sequence (bp) to load into the overlap hash table
utgMhapSensitivity                      Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
utgOvlHashLoad                    Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
utgMhapVersion                          Version of the MHAP jar file to use
utgovlMemory                       Amount of memory, in gigabytes, to use for overlaps for unitig construction jobs
utgOverlapper                          Which overlap algorithm to use for unitig construction
utgOvlMerDistinct                  K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
utgOvlErrorRate                        Overlaps at or below this error rate are used to trim reads
utgOvlMerSize                      K-mer size for seeds in overlaps
utgOvlFilter                            Filter overlaps based on expected kmers vs observed kmers
utgOvlMerThreshold                K-mer frequency threshold; mers more frequent than this count are ignored
utgOvlFrequentMers                      Do not seed overlaps with these kmers
utgOvlRefBlockLength              Amount of sequence (bp) to search against the hash table per batch
utgOvlHashBits                          Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per utgOvlHashBlockLength
utgovlStageSpace                  Amount of local disk space needed to stage data for overlaps for unitig construction jobs
utgOvlHashBlockLength                  Amount of sequence (bp) to load into the overlap hash table
utgovlThreads                      Number of threads to use for overlaps for unitig construction jobs
utgOvlHashLoad                          Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
utgReAlign                        Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl. Uses utgOvlErrorRate
utgOvlMerDistinct                      K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
utgRepeatConfusedBP                Repeats where the next best edge is at least this many bp shorter will not be split
utgOvlMerSize                          K-mer size for seeds in overlaps
utgRepeatConfusedPC                Repeats where the next best edge is at least this many percent shorter will not be split
utgOvlMerThreshold                      K-mer frequency threshold; mers more frequent than this count are ignored
utgRepeatDeviation                Overlaps this much above mean unitig error rate will not be used for repeat splitting
utgOvlRefBlockLength                    Amount of sequence (bp) to search against the hash table per batch
utgReAlign                              Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses utgOvlErrorRate
utgRepeatConfusedBP                    Repeats where the next best edge is at least this many bp shorter will not be split
utgRepeatConfusedPC                    Repeats where the next best edge is at least this many percent shorter will not be split
utgRepeatDeviation                      Overlaps this much above mean unitig error rate will not be used for repeat splitting
utgmhapConcurrency                      If grid not enabled, number of mhap overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
utgmhapMemory                          Amount of memory, in gigabytes, to use for mhap overlaps for unitig construction jobs
utgmhapStageSpace                      Amount of local disk space needed to stage data for mhap overlaps for unitig construction jobs
utgmhapThreads                          Number of threads to use for mhap overlaps for unitig construction jobs
utgmmapConcurrency                      If grid not enabled, number of mmap overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
utgmmapMemory                          Amount of memory, in gigabytes, to use for mmap overlaps for unitig construction jobs
utgmmapStageSpace                      Amount of local disk space needed to stage data for mmap overlaps for unitig construction jobs
utgmmapThreads                          Number of threads to use for mmap overlaps for unitig construction jobs
utgovlConcurrency                      If grid not enabled, number of overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
utgovlMemory                            Amount of memory, in gigabytes, to use for overlaps for unitig construction jobs
utgovlStageSpace                        Amount of local disk space needed to stage data for overlaps for unitig construction jobs
utgovlThreads                          Number of threads to use for overlaps for unitig construction jobs
 
 
</pre>
</pre>
[[#top|Back to Top]]
[[#top|Back to Top]]
Line 444: Line 425:
=== Installation ===
=== Installation ===
   
   
Source code obtained from https://github.com/marbl/canu/releases/download/v2.1.1/
Source code obtained from https://github.com/marbl/canu/releases/download/v2.2
 
=== System ===
=== System ===
64-bit Linux
64-bit Linux

Revision as of 11:16, 6 September 2023

Category

Bioinformatics

Program On

Sapelo2

Version

2.2

Author / Distributor

Canu

Description

"Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II/Sequel or Oxford Nanopore MinION). " More details are at Canu's documentation.

Running Program

Version 2.2

To use this version, please load the module with

ml canu/2.2-GCCcore-11.2.0.lua

When you invoke canu, please use the gridOptions to pass queueing system options for the jobs the canu pipeline submits. At a minimum, please specify a partition, the number of tasks and the walltime. For example, use gridOptions = --partition=batch --ntasks=1 --cpus-per-task=4 --time=168:00:00 . The --mem-per-cpu option will be added automatically by the pipeline scripts, but you can also add it if the pipeline is not able to estimate the memory needed correctly.


Here is an example of a shell script, sub.sh, to run Canu on the batch queue:

#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=canujobname
#SBATCH --ntasks=1
#SBATCH --time=1:00:00
#SBATCH --mem=10G

cd $SLURM_SUBMIT_DIR

ml canu/2.2-GCCcore-11.2.0.lua

canu gridOptions=" --partition=batch --ntasks=1 --cpus-per-task=4 --time=168:00:00 " [options]

where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well. Please note that the Slurm headers (#SBATCH lines) are only for Canu's initial job. The resource limits of all of the jobs that Canu spawns will be determined by what is defined in the gridOptions.


To submit the job submission use the command:

sbatch ./sub.sh 

Documentation

[cft07037@d2-13 canu]$ ml canu/2.2-GCCcore-11.2.0 
[cft07037@d2-13 canu]$ canu --help

usage:   canu [-version] [-citation] \
              [-haplotype | -correct | -trim | -assemble | -trim-assemble] \
              [-s <assembly-specifications-file>] \
               -p <assembly-prefix> \
               -d <assembly-directory> \
               genomeSize=<number>[g|m|k] \
              [other-options] \
              [-haplotype{NAME} illumina.fastq.gz] \
              [-corrected] \
              [-trimmed] \
              [-pacbio |
               -nanopore |
               -pacbio-hifi] file1 file2 ...

example: canu -d run1 -p godzilla genomeSize=1g -nanopore-raw reads/*.fasta.gz 


  To restrict canu to only a specific stage, use:
    -haplotype     - generate haplotype-specific reads
    -correct       - generate corrected reads
    -trim          - generate trimmed reads
    -assemble      - generate an assembly
    -trim-assemble - generate trimmed reads and then assemble them

  The assembly is computed in the -d <assembly-directory>, with output files named
  using the -p <assembly-prefix>.  This directory is created if needed.  It is not
  possible to run multiple assemblies in the same directory.

  The genome size should be your best guess of the haploid genome size of what is being
  assembled.  It is used primarily to estimate coverage in reads, NOT as the desired
  assembly size.  Fractional values are allowed: '4.7m' equals '4700k' equals '4700000'

  Some common options:
    useGrid=string
      - Run under grid control (true), locally (false), or set up for grid control
        but don't submit any jobs (remote)
    rawErrorRate=fraction-error
      - The allowed difference in an overlap between two raw uncorrected reads.  For lower
        quality reads, use a higher number.  The defaults are 0.300 for PacBio reads and
        0.500 for Nanopore reads.
    correctedErrorRate=fraction-error
      - The allowed difference in an overlap between two corrected reads.  Assemblies of
        low coverage or data with biological differences will benefit from a slight increase
        in this.  Defaults are 0.045 for PacBio reads and 0.144 for Nanopore reads.
    gridOptions=string
      - Pass string to the command used to submit jobs to the grid.  Can be used to set
        maximum run time limits.  Should NOT be used to set memory limits; Canu will do
        that for you.
    minReadLength=number
      - Ignore reads shorter than 'number' bases long.  Default: 1000.
    minOverlapLength=number
      - Ignore read-to-read overlaps shorter than 'number' bases long.  Default: 500.
  A full list of options can be printed with '-options'.  All options can be supplied in
  an optional sepc file with the -s option.

  For TrioCanu, haplotypes are specified with the -haplotype{NAME} option, with any
  number of haplotype-specific Illumina read files after.  The {NAME} of each haplotype
  is free text (but only letters and numbers, please).  For example:
    -haplotypeNANNY nanny/*gz
    -haplotypeBILLY billy1.fasta.gz billy2.fasta.gz

  Reads can be either FASTA or FASTQ format, uncompressed, or compressed with gz, bz2 or xz.

  Reads are specified by the technology they were generated with, and any processing performed.

  [processing]
    -corrected
    -trimmed

  [technology]
    -pacbio      <files>
    -nanopore    <files>
    -pacbio-hifi <files>

Complete documentation at http://canu.readthedocs.org/en/latest/


[cft07037@d2-13 canu]$ ml canu/2.2-GCCcore-11.2.0 
[cft07037@d2-13 canu]$ canu -options
batConcurrency                     Unused, only one process supported
batMemory                          Approximate maximum memory usage, in gigabytes, default is the maxMemory limit
batOptions                         Advanced options to bogart
batStageSpace                      Amount of local disk space needed to stage data for unitig construction jobs
batThreads                         Number of threads to use; default is the maxThreads limit
cnsConcurrency                     If grid not enabled, number of unitig consensus jobs to run at the same time; default is n_proc / n_threads
cnsConsensus                       Which consensus algorithm to use; 'pbdagcon' (fast, reliable); 'utgcns' (multialignment output); 'quick' (single read mosaic); default 'pbdagcon'
cnsErrorRate                       Consensus expects alignments at about this error rate
cnsMaxCoverage                     Limit unitig consensus to at most this coverage; default '40' = unlimited
cnsMemory                          Amount of memory, in gigabytes, to use for unitig consensus jobs
cnsPartitions                      Attempt to create this many consensus jobs; default '0' = based on the largest tig
cnsStageSpace                      Amount of local disk space needed to stage data for unitig consensus jobs
cnsThreads                         Number of threads to use for unitig consensus jobs
contigFilter                       Parameters to filter out 'unassembled' unitigs.  Five values: minReads minLength singleReadSpan lowCovFraction lowCovDepth
corConcurrency                     If grid not enabled, number of read correction jobs to run at the same time; default is n_proc / n_threads
corConsensus                       Which consensus algorithm to use; only 'falcon' is supported; default 'falcon'
corErrorRate                       Only use raw alignments below this error rate to construct corrected reads
corFilter                          Method to filter short reads from correction; 'quick' or 'expensive'; default 'expensive'
corMaxEvidenceCoverageGlobal       Limit reads used for correction to supporting at most this coverage; default: '1.0x' = 1.0 * estimated coverage
corMaxEvidenceCoverageLocal        Limit reads being corrected to at most this much evidence coverage; default: '2.0x' = 2.0 * estimated coverage
corMaxEvidenceErate                Limit read correction to only overlaps at or below this fraction error; default: unlimited
corMemory                          Amount of memory, in gigabytes, to use for read correction jobs
corMhapBlockSize                   Number of reads per GB of memory allowed (mhapMemory)
cormhapConcurrency                 If grid not enabled, number of mhap overlaps for correction jobs to run at the same time; default is n_proc / n_threads
corMhapFilterThreshold             Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
corMhapFilterUnique                Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
cormhapMemory                      Amount of memory, in gigabytes, to use for mhap overlaps for correction jobs
corMhapMerSize                     K-mer size for seeds in mhap
corMhapNoTf                        Expert option: True or false, do not use tf weighting, only idf of tf-idf.
corMhapOptions                     Expert option: free-form parameters to pass to MHAP.
corMhapOrderedMerSize              K-mer size for second-stage filter in mhap
corMhapPipe                        Report results to a pipe instead of *large* files.
corMhapSensitivity                 Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
cormhapStageSpace                  Amount of local disk space needed to stage data for mhap overlaps for correction jobs
cormhapThreads                     Number of threads to use for mhap overlaps for correction jobs
corMhapVersion                     Version of the MHAP jar file to use
corMinCoverage                     Minimum number of bases supporting each corrected base, if less than this sequences are split; default based on input read coverage: 0 <= 30x < 4 < 60x <= 4
corMinEvidenceLength               Limit read correction to only overlaps longer than this; default: unlimited
corMMapBlockSize                   Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
cormmapConcurrency                 If grid not enabled, number of mmap overlaps for correction jobs to run at the same time; default is n_proc / n_threads
cormmapMemory                      Amount of memory, in gigabytes, to use for mmap overlaps for correction jobs
corMMapMerSize                     K-mer size for seeds in minmap
cormmapStageSpace                  Amount of local disk space needed to stage data for mmap overlaps for correction jobs
cormmapThreads                     Number of threads to use for mmap overlaps for correction jobs
corOutCoverage                     Only correct the longest reads up to this coverage; default 40
corOverlapper                      Which overlap algorithm to use for correction
corovlConcurrency                  If grid not enabled, number of overlaps for correction jobs to run at the same time; default is n_proc / n_threads
corOvlErrorRate                    Overlaps above this error rate are not computed
corOvlFilter                       Filter overlaps based on expected kmers vs observed kmers
corOvlFrequentMers                 Do not seed overlaps with these kmers
corOvlHashBits                     Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per corOvlHashBlockLength
corOvlHashBlockLength              Amount of sequence (bp) to load into the overlap hash table
corOvlHashLoad                     Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
corovlMemory                       Amount of memory, in gigabytes, to use for overlaps for correction jobs
corOvlMerDistinct                  K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
corOvlMerSize                      K-mer size for seeds in overlaps
corOvlMerThreshold                 K-mer frequency threshold; mers more frequent than this count are ignored
corOvlRefBlockLength               Amount of sequence (bp) to search against the hash table per batch
corovlStageSpace                   Amount of local disk space needed to stage data for overlaps for correction jobs
corovlThreads                      Number of threads to use for overlaps for correction jobs
corPartitionMin                    Don't make a read correction partition with fewer than N reads
corPartitions                      Partition read correction into N jobs
corReAlign                         Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses corOvlErrorRate
correctedErrorRate                 Expected fraction error in an alignment of two corrected reads
corStageSpace                      Amount of local disk space needed to stage data for read correction jobs
corThreads                         Number of threads to use for read correction jobs
enableOEA                          Do overlap error adjustment - comprises two steps: read error detection (RED) and overlap error adjustment (OEA); default 'true'
executiveMemory                    Amount of memory, in GB, to reserve for the Canu exective process
executiveThreads                   Number of threads to reserve for the Canu exective process
genomeSize                         An estimate of the size of the genome
gnuplot                            Path to the gnuplot executable
gnuplotImageFormat                 Image format that gnuplot will generate.  Default: based on gnuplot, 'png', 'svg' or 'gif'
gridEngine                         Grid engine configuration, not documented
gridEngineArrayMaxJobs             Grid engine configuration, not documented
gridEngineArrayName                Grid engine configuration, not documented
gridEngineArrayOption              Grid engine configuration, not documented
gridEngineArraySubmitID            Grid engine configuration, not documented
gridEngineJobID                    Grid engine configuration, not documented
gridEngineMemoryOption             Grid engine configuration, not documented
gridEngineMemoryPerJob             Grid engine configuration, not documented
gridEngineMemoryUnits              Grid engine configuration, not documented
gridEngineNameOption               Grid engine configuration, not documented
gridEngineNameToJobIDCommand       Grid engine configuration, not documented
gridEngineNameToJobIDCommandNoArrayGrid engine configuration, not documented
gridEngineOutputOption             Grid engine configuration, not documented
gridEngineResourceOption           Grid engine configuration, not documented
gridEngineStageOption              Grid engine configuration, not documented
gridEngineSubmitCommand            Grid engine configuration, not documented
gridEngineTaskID                   Grid engine configuration, not documented
gridEngineThreadsOption            Grid engine configuration, not documented
gridOptions                        Grid engine options applied to all jobs
gridOptionsbat                     Grid engine options applied to unitig construction jobs
gridOptionscns                     Grid engine options applied to unitig consensus jobs
gridOptionscor                     Grid engine options applied to read correction jobs
gridOptionscormhap                 Grid engine options applied to mhap overlaps for correction jobs
gridOptionscormmap                 Grid engine options applied to mmap overlaps for correction jobs
gridOptionscorovl                  Grid engine options applied to overlaps for correction jobs
gridOptionsExecutive               Grid engine options applied to the canu executive script
gridOptionshap                     Grid engine options applied to haplotype assignment jobs
gridOptionsJobName                 Grid jobs job-name suffix
gridOptionsmeryl                   Grid engine options applied to mer counting jobs
gridOptionsobtmhap                 Grid engine options applied to mhap overlaps for trimming jobs
gridOptionsobtmmap                 Grid engine options applied to mmap overlaps for trimming jobs
gridOptionsobtovl                  Grid engine options applied to overlaps for trimming jobs
gridOptionsoea                     Grid engine options applied to overlap error adjustment jobs
gridOptionsovb                     Grid engine options applied to overlap store bucketizing jobs
gridOptionsovs                     Grid engine options applied to overlap store sorting jobs
gridOptionsred                     Grid engine options applied to read error detection jobs
gridOptionsutgmhap                 Grid engine options applied to mhap overlaps for unitig construction jobs
gridOptionsutgmmap                 Grid engine options applied to mmap overlaps for unitig construction jobs
gridOptionsutgovl                  Grid engine options applied to overlaps for unitig construction jobs
hapConcurrency                     Unused, there is only one process
hapMemory                          Amount of memory, in gigabytes, to use for haplotype assignment
hapStageSpace                      Amount of local disk space needed to stage data for haplotype assignment jobs
hapThreads                         Number of threads to use for haplotype assignment
hapUnknownFraction                 Fraction of allowed unknown bases before they are included in the assembly, between 0-1; default 0.05
homoPolyCompress                   Compute everything but consensus sequences using homopolymer compressed reads
java                               Java interpreter to use; at least version 1.8; default 'java'
javaUse64Bit                       Java interpreter supports the -d64 or -d32 flags; default auto
maxInputCoverage                   If input coverage is high, downsample to something reasonable; default 200
maxMemory                          Maximum memory to use by any component of the assembler
maxThreads                         Maximum number of compute threads to use by any component of the assembler
merylConcurrency                   Unused, there is only one process
merylMemory                        Amount of memory, in gigabytes, to use for mer counting
merylStageSpace                    Amount of local disk space needed to stage data for mer counting jobs
merylThreads                       Number of threads to use for mer counting
minimap                            Path to minimap2; default 'minimap2'
minInputCoverage                   Stop if input coverage is too low; default 10
minMemory                          Minimum amount of memory needed to compute the assembly (do not set unless prompted!)
minOverlapLength                   Overlaps shorter than this length are not computed; default 500
minReadLength                      Reads shorter than this length are not loaded into the assembler; default 1000
minThreads                         Minimum number of compute threads suggested to compute the assembly
objectStore                        Type of object storage used; not ready for production yet
objectStoreClient                  Path to the command line client used to access the object storage
objectStoreClientDA                Path to the command line client used to download files from object storage
objectStoreClientUA                Path to the command line client used to upload files to object storage
objectStoreNameSpace               Object store parameters; specific to the type of objectStore used
objectStoreProject                 Object store project; specific to the type of objectStore used
obtErrorRate                       Stringency of overlaps to use for trimming
obtMhapBlockSize                   Number of reads per GB of memory allowed (mhapMemory)
obtmhapConcurrency                 If grid not enabled, number of mhap overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
obtMhapFilterThreshold             Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
obtMhapFilterUnique                Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
obtmhapMemory                      Amount of memory, in gigabytes, to use for mhap overlaps for trimming jobs
obtMhapMerSize                     K-mer size for seeds in mhap
obtMhapNoTf                        Expert option: True or false, do not use tf weighting, only idf of tf-idf.
obtMhapOptions                     Expert option: free-form parameters to pass to MHAP.
obtMhapOrderedMerSize              K-mer size for second-stage filter in mhap
obtMhapPipe                        Report results to a pipe instead of *large* files.
obtMhapSensitivity                 Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
obtmhapStageSpace                  Amount of local disk space needed to stage data for mhap overlaps for trimming jobs
obtmhapThreads                     Number of threads to use for mhap overlaps for trimming jobs
obtMhapVersion                     Version of the MHAP jar file to use
obtMMapBlockSize                   Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
obtmmapConcurrency                 If grid not enabled, number of mmap overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
obtmmapMemory                      Amount of memory, in gigabytes, to use for mmap overlaps for trimming jobs
obtMMapMerSize                     K-mer size for seeds in minmap
obtmmapStageSpace                  Amount of local disk space needed to stage data for mmap overlaps for trimming jobs
obtmmapThreads                     Number of threads to use for mmap overlaps for trimming jobs
obtOverlapper                      Which overlap algorithm to use for overlap based trimming
obtovlConcurrency                  If grid not enabled, number of overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
obtOvlErrorRate                    Overlaps at or below this error rate are used to trim reads
obtOvlFilter                       Filter overlaps based on expected kmers vs observed kmers
obtOvlFrequentMers                 Do not seed overlaps with these kmers
obtOvlHashBits                     Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per obtOvlHashBlockLength
obtOvlHashBlockLength              Amount of sequence (bp) to load into the overlap hash table
obtOvlHashLoad                     Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
obtovlMemory                       Amount of memory, in gigabytes, to use for overlaps for trimming jobs
obtOvlMerDistinct                  K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
obtOvlMerSize                      K-mer size for seeds in overlaps
obtOvlMerThreshold                 K-mer frequency threshold; mers more frequent than this count are ignored
obtOvlRefBlockLength               Amount of sequence (bp) to search against the hash table per batch
obtovlStageSpace                   Amount of local disk space needed to stage data for overlaps for trimming jobs
obtovlThreads                      Number of threads to use for overlaps for trimming jobs
obtReAlign                         Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses obtOvlErrorRate
oeaBatchLength                     Number of bases per overlap error correction batch
oeaBatchSize                       Number of reads per overlap error correction batch
oeaConcurrency                     If grid not enabled, number of overlap error adjustment jobs to run at the same time; default is n_proc / n_threads
oeaErrorRate                       Only use overlaps with at most this much fraction error to find errors in reads; default utgOvlErrorRate, 0.003 for HiFi reads
oeaHaploConfirm                    This many or more reads will confirm a true haplotype difference; default 5
oeaMaskTrivial                     Mask trivial DNA in Overlap Error Adjustment; default off; on for HiFi reads
oeaMemory                          Amount of memory, in gigabytes, to use for overlap error adjustment jobs
oeaStageSpace                      Amount of local disk space needed to stage data for overlap error adjustment jobs
oeaThreads                         Number of threads to use for overlap error adjustment jobs
onFailure                          Full path to command to run on failure
onSuccess                          Full path to command to run on successful completion
ovbConcurrency                     If grid not enabled, number of overlap store bucketizing jobs to run at the same time; default is n_proc / n_threads
ovbMemory                          Amount of memory, in gigabytes, to use for overlap store bucketizing jobs
ovbStageSpace                      Amount of local disk space needed to stage data for overlap store bucketizing jobs
ovbThreads                         Number of threads to use for overlap store bucketizing jobs
ovsConcurrency                     If grid not enabled, number of overlap store sorting jobs to run at the same time; default is n_proc / n_threads
ovsMemory                          Amount of memory, in gigabytes, to use for overlap store sorting jobs
ovsStageSpace                      Amount of local disk space needed to stage data for overlap store sorting jobs
ovsThreads                         Number of threads to use for overlap store sorting jobs
preExec                            A command line to run at the start of Canu execution scripts
purgeOverlaps                      When to delete intermediate overlap files: never, normal (default), aggressive, dangerous
rawErrorRate                       Expected fraction error in an alignment of two uncorrected reads
readSamplingBias                   Score reads as 'random * length^bias', keep the highest scoring reads
redBatchLength                     Number of bases per fragment error detection batch
redBatchSize                       Number of reads per fragment error detection batch
redConcurrency                     If grid not enabled, number of read error detection jobs to run at the same time; default is n_proc / n_threads
redMemory                          Amount of memory, in gigabytes, to use for read error detection jobs
redStageSpace                      Amount of local disk space needed to stage data for read error detection jobs
redThreads                         Number of threads to use for read error detection jobs
saveMerCounts                      Save full mer counting results, sometimes useful
saveOverlaps                       Do not remove the overlap stores.  Default: false = remove overlap stores when they're no longer needed
saveReadCorrections                Save intermediate read correction files, almost never a good idea
saveReadHaplotypes                 Save intermediate read haplotype files, almost never a good idea
saveReads                          Save intermediate corrected and trimmed reads to asm.correctedReads.fasta.gz and asm.trimmedReads.fasta.gz
shell                              Command interpreter to use; sh-compatible (e.g., bash), NOT C-shell (csh or tcsh); default '/bin/sh'
showNext                           Don't run any commands, just report what would run
stageDirectory                     If set, copy heavily used data to this node-local location
stopAfter                          Stop after a specific algorithm step is completed
stopOnLowCoverage                  Stop if raw, corrected or trimmed read coverage is low
trimReadsCoverage                  Minimum depth of evidence to retain bases; default '2
trimReadsOverlap                   Minimum overlap between evidence to make contiguous trim; default '500'
unitigger                          Which unitig algorithm to use; only 'bogart' supported; default 'bogart'
useGrid                            If 'true', enable grid-based execution; if 'false', run all jobs on the local machine; if 'remote', create jobs for grid execution but do not submit; default 'true'
useGridbat                         If 'true', run module unitig construction under grid control; if 'false' run locally.
useGridcns                         If 'true', run module unitig consensus under grid control; if 'false' run locally.
useGridcor                         If 'true', run module read correction under grid control; if 'false' run locally.
useGridcormhap                     If 'true', run module mhap overlaps for correction under grid control; if 'false' run locally.
useGridcormmap                     If 'true', run module mmap overlaps for correction under grid control; if 'false' run locally.
useGridcorovl                      If 'true', run module overlaps for correction under grid control; if 'false' run locally.
useGridhap                         If 'true', run module haplotype assignment under grid control; if 'false' run locally.
useGridmeryl                       If 'true', run module mer counting under grid control; if 'false' run locally.
useGridobtmhap                     If 'true', run module mhap overlaps for trimming under grid control; if 'false' run locally.
useGridobtmmap                     If 'true', run module mmap overlaps for trimming under grid control; if 'false' run locally.
useGridobtovl                      If 'true', run module overlaps for trimming under grid control; if 'false' run locally.
useGridoea                         If 'true', run module overlap error adjustment under grid control; if 'false' run locally.
useGridovb                         If 'true', run module overlap store bucketizing under grid control; if 'false' run locally.
useGridovs                         If 'true', run module overlap store sorting under grid control; if 'false' run locally.
useGridred                         If 'true', run module read error detection under grid control; if 'false' run locally.
useGridutgmhap                     If 'true', run module mhap overlaps for unitig construction under grid control; if 'false' run locally.
useGridutgmmap                     If 'true', run module mmap overlaps for unitig construction under grid control; if 'false' run locally.
useGridutgovl                      If 'true', run module overlaps for unitig construction under grid control; if 'false' run locally.
utgBubbleDeviation                 Overlaps this much above mean of contig will be used to identify bubbles
utgChimeraType                     When to filter reads for contig construction: none, chimera (missing middle), uncovered (missing middle or ends), deadend (missing middle or end or no neighbor) (default)
utgErrorRate                       Overlaps at or below this error rate are used to construct contigs
utgGraphDeviation                  Overlaps this much above median will not be used for initial graph construction
utgMhapBlockSize                   Number of reads per GB of memory allowed (mhapMemory)
utgmhapConcurrency                 If grid not enabled, number of mhap overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
utgMhapFilterThreshold             Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
utgMhapFilterUnique                Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
utgmhapMemory                      Amount of memory, in gigabytes, to use for mhap overlaps for unitig construction jobs
utgMhapMerSize                     K-mer size for seeds in mhap
utgMhapNoTf                        Expert option: True or false, do not use tf weighting, only idf of tf-idf.
utgMhapOptions                     Expert option: free-form parameters to pass to MHAP.
utgMhapOrderedMerSize              K-mer size for second-stage filter in mhap
utgMhapPipe                        Report results to a pipe instead of *large* files.
utgMhapSensitivity                 Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
utgmhapStageSpace                  Amount of local disk space needed to stage data for mhap overlaps for unitig construction jobs
utgmhapThreads                     Number of threads to use for mhap overlaps for unitig construction jobs
utgMhapVersion                     Version of the MHAP jar file to use
utgMMapBlockSize                   Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
utgmmapConcurrency                 If grid not enabled, number of mmap overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
utgmmapMemory                      Amount of memory, in gigabytes, to use for mmap overlaps for unitig construction jobs
utgMMapMerSize                     K-mer size for seeds in minmap
utgmmapStageSpace                  Amount of local disk space needed to stage data for mmap overlaps for unitig construction jobs
utgmmapThreads                     Number of threads to use for mmap overlaps for unitig construction jobs
utgOverlapper                      Which overlap algorithm to use for unitig construction
utgovlConcurrency                  If grid not enabled, number of overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
utgOvlErrorRate                    Overlaps at or below this error rate are used to trim reads
utgOvlFilter                       Filter overlaps based on expected kmers vs observed kmers
utgOvlFrequentMers                 Do not seed overlaps with these kmers
utgOvlHashBits                     Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per utgOvlHashBlockLength
utgOvlHashBlockLength              Amount of sequence (bp) to load into the overlap hash table
utgOvlHashLoad                     Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
utgovlMemory                       Amount of memory, in gigabytes, to use for overlaps for unitig construction jobs
utgOvlMerDistinct                  K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
utgOvlMerSize                      K-mer size for seeds in overlaps
utgOvlMerThreshold                 K-mer frequency threshold; mers more frequent than this count are ignored
utgOvlRefBlockLength               Amount of sequence (bp) to search against the hash table per batch
utgovlStageSpace                   Amount of local disk space needed to stage data for overlaps for unitig construction jobs
utgovlThreads                      Number of threads to use for overlaps for unitig construction jobs
utgReAlign                         Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses utgOvlErrorRate
utgRepeatConfusedBP                Repeats where the next best edge is at least this many bp shorter will not be split
utgRepeatConfusedPC                Repeats where the next best edge is at least this many percent shorter will not be split
utgRepeatDeviation                 Overlaps this much above mean unitig error rate will not be used for repeat splitting

Back to Top

Installation

Source code obtained from https://github.com/marbl/canu/releases/download/v2.2

System

64-bit Linux