Canu-Sapelo2

From Research Computing Center Wiki
Jump to navigation Jump to search

Category

Bioinformatics

Program On

Sapelo2

Version

2.1.1

Author / Distributor

Canu

Description

"Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II/Sequel or Oxford Nanopore MinION). " More details are at Canu

Running Program

Version 2.1.1

To use this version, please load the module with

ml canu/2.1.1-GCCcore-8.3.0-Java-11

When you invoke canu, please use the gridOptions to pass queueing system options for the jobs the canu pipeline submits. At a minimum, please specify a partition, the number of tasks and the walltime. For example, use gridOptions = --partition=batch --ntasks=1 --cpus-per-task=4 --time=168:00:00 . The --mem option will be added automatically by the pipeline scripts.


Here is an example of a shell script, sub.sh, to run on the batch queue:

#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=canujobname
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --time=120:00:00
#SBATCH --mem=40G

cd $SLURM_SUBMIT_DIR

ml canu/2.1.1-GCCcore-8.3.0-Java-11

canu gridOptions=" --partition=batch --ntasks=1 --cpus-per-task=4 --time=168:00:00 " [options]

where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.


To submit the job submission use the command:

sbatch ./sub.sh 

Documentation

[shtsai@b1-24 ~]$ ml canu/2.1.1-GCCcore-8.3.0-Java-11
[shtsai@b1-24 ~]$ canu --help

usage:   canu [-version] [-citation] \
              [-haplotype | -correct | -trim | -assemble | -trim-assemble] \
              [-s <assembly-specifications-file>] \
               -p <assembly-prefix> \
               -d <assembly-directory> \
               genomeSize=<number>[g|m|k] \
              [other-options] \
              [-haplotype{NAME} illumina.fastq.gz] \
              [-corrected] \
              [-trimmed] \
              [-pacbio |
               -nanopore |
               -pacbio-hifi] file1 file2 ...

example: canu -d run1 -p godzilla genomeSize=1g -nanopore-raw reads/*.fasta.gz 


  To restrict canu to only a specific stage, use:
    -haplotype     - generate haplotype-specific reads
    -correct       - generate corrected reads
    -trim          - generate trimmed reads
    -assemble      - generate an assembly
    -trim-assemble - generate trimmed reads and then assemble them

  The assembly is computed in the -d <assembly-directory>, with output files named
  using the -p <assembly-prefix>.  This directory is created if needed.  It is not
  possible to run multiple assemblies in the same directory.

  The genome size should be your best guess of the haploid genome size of what is being
  assembled.  It is used primarily to estimate coverage in reads, NOT as the desired
  assembly size.  Fractional values are allowed: '4.7m' equals '4700k' equals '4700000'

  Some common options:
    useGrid=string
      - Run under grid control (true), locally (false), or set up for grid control
        but don't submit any jobs (remote)
    rawErrorRate=fraction-error
      - The allowed difference in an overlap between two raw uncorrected reads.  For lower
        quality reads, use a higher number.  The defaults are 0.300 for PacBio reads and
        0.500 for Nanopore reads.
    correctedErrorRate=fraction-error
      - The allowed difference in an overlap between two corrected reads.  Assemblies of
        low coverage or data with biological differences will benefit from a slight increase
        in this.  Defaults are 0.045 for PacBio reads and 0.144 for Nanopore reads.
    gridOptions=string
      - Pass string to the command used to submit jobs to the grid.  Can be used to set
        maximum run time limits.  Should NOT be used to set memory limits; Canu will do
        that for you.
    minReadLength=number
      - Ignore reads shorter than 'number' bases long.  Default: 1000.
    minOverlapLength=number
      - Ignore read-to-read overlaps shorter than 'number' bases long.  Default: 500.
  A full list of options can be printed with '-options'.  All options can be supplied in
  an optional sepc file with the -s option.

  For TrioCanu, haplotypes are specified with the -haplotype{NAME} option, with any
  number of haplotype-specific Illumina read files after.  The {NAME} of each haplotype
  is free text (but only letters and numbers, please).  For example:
    -haplotypeNANNY nanny/*gz
    -haplotypeBILLY billy1.fasta.gz billy2.fasta.gz

  Reads can be either FASTA or FASTQ format, uncompressed, or compressed with gz, bz2 or xz.

  Reads are specified by the technology they were generated with, and any processing performed.

  [processing]
    -corrected
    -trimmed

  [technology]
    -pacbio      <files>
    -nanopore    <files>
    -pacbio-hifi <files>

Complete documentation at http://canu.readthedocs.org/en/latest/


[shtsai@b1-24 ~]$ ml canu/2.1.1-GCCcore-8.3.0-Java-11
[shtsai@b1-24 ~]$ canu -options
MMapBlockSize                           Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
MMapMerSize                             K-mer size for seeds in minmap
MhapBlockSize                           Number of reads per GB of memory allowed (mhapMemory)
MhapFilterThreshold                     Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
MhapFilterUnique                        Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
MhapMerSize                             K-mer size for seeds in mhap
MhapNoTf                                Expert option: True or false, do not use tf weighting, only idf of tf-idf.
MhapOptions                             Expert option: free-form parameters to pass to MHAP.
MhapOrderedMerSize                      K-mer size for second-stage filter in mhap
MhapSensitivity                         Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
MhapVersion                             Version of the MHAP jar file to use
Overlapper                              Which overlap algorithm to use for unitig construction
OvlFilter                               Filter overlaps based on expected kmers vs observed kmers
OvlFrequentMers                         Do not seed overlaps with these kmers
OvlHashBits                             Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per utgOvlHashBlockLength
OvlHashBlockLength                      Amount of sequence (bp) to load into the overlap hash table
OvlHashLoad                             Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
OvlMerDistinct                          K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
OvlMerSize                              K-mer size for seeds in overlaps
OvlMerThreshold                         K-mer frequency threshold; mers more frequent than this count are ignored
OvlRefBlockLength                       Amount of sequence (bp) to search against the hash table per batch
ReAlign                                 Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses utgOvlErrorRate
batConcurrency                          Unused, only one process supported
batMemory                               Approximate maximum memory usage, in gigabytes, default is the maxMemory limit
batOptions                              Advanced options to bogart
batStageSpace                           Amount of local disk space needed to stage data for unitig construction jobs
batThreads                              Number of threads to use; default is the maxThreads limit
cnsConcurrency                          If grid not enabled, number of unitig consensus jobs to run at the same time; default is n_proc / n_threads
cnsConsensus                            Which consensus algorithm to use; 'pbdagcon' (fast, reliable); 'utgcns' (multialignment output); 'quick' (single read mosaic); default 'pbdagcon'
cnsErrorRate                            Consensus expects alignments at about this error rate
cnsMaxCoverage                          Limit unitig consensus to at most this coverage; default '40' = unlimited
cnsMemory                               Amount of memory, in gigabytes, to use for unitig consensus jobs
cnsPartitions                           Attempt to create this many consensus jobs; default '0' = based on the largest tig
cnsStageSpace                           Amount of local disk space needed to stage data for unitig consensus jobs
cnsThreads                              Number of threads to use for unitig consensus jobs
contigFilter                            Parameters to filter out 'unassembled' unitigs.  Five values: minReads minLength singleReadSpan lowCovFraction lowCovDepth
corConcurrency                          If grid not enabled, number of read correction jobs to run at the same time; default is n_proc / n_threads
corConsensus                            Which consensus algorithm to use; only 'falcon' is supported; default 'falcon'
corErrorRate                            Only use raw alignments below this error rate to construct corrected reads
corFilter                               Method to filter short reads from correction; 'quick' or 'expensive'; default 'expensive'
corMMapBlockSize                        Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
corMMapMerSize                          K-mer size for seeds in minmap
corMaxEvidenceCoverageGlobal            Limit reads used for correction to supporting at most this coverage; default: '1.0x' = 1.0 * estimated coverage
corMaxEvidenceCoverageLocal             Limit reads being corrected to at most this much evidence coverage; default: '2.0x' = 2.0 * estimated coverage
corMaxEvidenceErate                     Limit read correction to only overlaps at or below this fraction error; default: unlimited
corMemory                               Amount of memory, in gigabytes, to use for read correction jobs
corMhapBlockSize                        Number of reads per GB of memory allowed (mhapMemory)
corMhapFilterThreshold                  Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
corMhapFilterUnique                     Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
corMhapMerSize                          K-mer size for seeds in mhap
corMhapNoTf                             Expert option: True or false, do not use tf weighting, only idf of tf-idf.
corMhapOptions                          Expert option: free-form parameters to pass to MHAP.
corMhapOrderedMerSize                   K-mer size for second-stage filter in mhap
corMhapSensitivity                      Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
corMhapVersion                          Version of the MHAP jar file to use
corMinCoverage                          Minimum number of bases supporting each corrected base, if less than this sequences are split; default based on input read coverage: 0 <= 30x < 4 < 60x <= 4
corMinEvidenceLength                    Limit read correction to only overlaps longer than this; default: unlimited
corOutCoverage                          Only correct the longest reads up to this coverage; default 40
corOverlapper                           Which overlap algorithm to use for correction
corOvlErrorRate                         Overlaps above this error rate are not computed
corOvlFilter                            Filter overlaps based on expected kmers vs observed kmers
corOvlFrequentMers                      Do not seed overlaps with these kmers
corOvlHashBits                          Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per corOvlHashBlockLength
corOvlHashBlockLength                   Amount of sequence (bp) to load into the overlap hash table
corOvlHashLoad                          Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
corOvlMerDistinct                       K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
corOvlMerSize                           K-mer size for seeds in overlaps
corOvlMerThreshold                      K-mer frequency threshold; mers more frequent than this count are ignored
corOvlRefBlockLength                    Amount of sequence (bp) to search against the hash table per batch
corPartitionMin                         Don't make a read correction partition with fewer than N reads
corPartitions                           Partition read correction into N jobs
corReAlign                              Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses corOvlErrorRate
corStageSpace                           Amount of local disk space needed to stage data for read correction jobs
corThreads                              Number of threads to use for read correction jobs
cormhapConcurrency                      If grid not enabled, number of mhap overlaps for correction jobs to run at the same time; default is n_proc / n_threads
cormhapMemory                           Amount of memory, in gigabytes, to use for mhap overlaps for correction jobs
cormhapStageSpace                       Amount of local disk space needed to stage data for mhap overlaps for correction jobs
cormhapThreads                          Number of threads to use for mhap overlaps for correction jobs
cormmapConcurrency                      If grid not enabled, number of mmap overlaps for correction jobs to run at the same time; default is n_proc / n_threads
cormmapMemory                           Amount of memory, in gigabytes, to use for mmap overlaps for correction jobs
cormmapStageSpace                       Amount of local disk space needed to stage data for mmap overlaps for correction jobs
cormmapThreads                          Number of threads to use for mmap overlaps for correction jobs
corovlConcurrency                       If grid not enabled, number of overlaps for correction jobs to run at the same time; default is n_proc / n_threads
corovlMemory                            Amount of memory, in gigabytes, to use for overlaps for correction jobs
corovlStageSpace                        Amount of local disk space needed to stage data for overlaps for correction jobs
corovlThreads                           Number of threads to use for overlaps for correction jobs
correctedErrorRate                      Expected fraction error in an alignment of two corrected reads
enableOEA                               Do overlap error adjustment - comprises two steps: read error detection (RED) and overlap error adjustment (OEA); default 'true'
executiveMemory                         Amount of memory, in GB, to reserve for the Canu exective process
executiveThreads                        Number of threads to reserve for the Canu exective process
genomeSize                              An estimate of the size of the genome
gnuplot                                 Path to the gnuplot executable
gnuplotImageFormat                      Image format that gnuplot will generate.  Default: based on gnuplot, 'png', 'svg' or 'gif'
gridEngine                              Grid engine configuration, not documented
gridEngineArrayMaxJobs                  Grid engine configuration, not documented
gridEngineArrayName                     Grid engine configuration, not documented
gridEngineArrayOption                   Grid engine configuration, not documented
gridEngineArraySubmitID                 Grid engine configuration, not documented
gridEngineJobID                         Grid engine configuration, not documented
gridEngineMemoryOption                  Grid engine configuration, not documented
gridEngineMemoryPerJob                  Grid engine configuration, not documented
gridEngineMemoryUnits                   Grid engine configuration, not documented
gridEngineNameOption                    Grid engine configuration, not documented
gridEngineNameToJobIDCommand            Grid engine configuration, not documented
gridEngineNameToJobIDCommandNoArray     Grid engine configuration, not documented
gridEngineOutputOption                  Grid engine configuration, not documented
gridEngineResourceOption                Grid engine configuration, not documented
gridEngineStageOption                   Grid engine configuration, not documented
gridEngineSubmitCommand                 Grid engine configuration, not documented
gridEngineTaskID                        Grid engine configuration, not documented
gridEngineThreadsOption                 Grid engine configuration, not documented
gridOptions                             Grid engine options applied to all jobs
gridOptionsExecutive                    Grid engine options applied to the canu executive script
gridOptionsJobName                      Grid jobs job-name suffix
gridOptionsbat                          Grid engine options applied to unitig construction jobs
gridOptionscns                          Grid engine options applied to unitig consensus jobs
gridOptionscor                          Grid engine options applied to read correction jobs
gridOptionscormhap                      Grid engine options applied to mhap overlaps for correction jobs
gridOptionscormmap                      Grid engine options applied to mmap overlaps for correction jobs
gridOptionscorovl                       Grid engine options applied to overlaps for correction jobs
gridOptionshap                          Grid engine options applied to haplotype assignment jobs
gridOptionsmeryl                        Grid engine options applied to mer counting jobs
gridOptionsobtmhap                      Grid engine options applied to mhap overlaps for trimming jobs
gridOptionsobtmmap                      Grid engine options applied to mmap overlaps for trimming jobs
gridOptionsobtovl                       Grid engine options applied to overlaps for trimming jobs
gridOptionsoea                          Grid engine options applied to overlap error adjustment jobs
gridOptionsovb                          Grid engine options applied to overlap store bucketizing jobs
gridOptionsovs                          Grid engine options applied to overlap store sorting jobs
gridOptionsred                          Grid engine options applied to read error detection jobs
gridOptionsutgmhap                      Grid engine options applied to mhap overlaps for unitig construction jobs
gridOptionsutgmmap                      Grid engine options applied to mmap overlaps for unitig construction jobs
gridOptionsutgovl                       Grid engine options applied to overlaps for unitig construction jobs
hapConcurrency                          Unused, there is only one process
hapMemory                               Amount of memory, in gigabytes, to use for haplotype assignment
hapStageSpace                           Amount of local disk space needed to stage data for haplotype assignment jobs
hapThreads                              Number of threads to use for haplotype assignment
hapUnknownFraction                      Fraction of allowed unknown bases before they are included in the assembly, between 0-1; default 0.05
homoPolyCompress                        Compute everything but consensus sequences using homopolymer compressed reads
java                                    Java interpreter to use; at least version 1.8; default 'java'
javaUse64Bit                            Java interpreter supports the -d64 or -d32 flags; default auto
maxInputCoverage                        If input coverage is high, downsample to something reasonable; default 200
maxMemory                               Maximum memory to use by any component of the assembler
maxThreads                              Maximum number of compute threads to use by any component of the assembler
merylConcurrency                        Unused, there is only one process
merylMemory                             Amount of memory, in gigabytes, to use for mer counting
merylStageSpace                         Amount of local disk space needed to stage data for mer counting jobs
merylThreads                            Number of threads to use for mer counting
minInputCoverage                        Stop if input coverage is too low; default 10
minMemory                               Minimum amount of memory needed to compute the assembly (do not set unless prompted!)
minOverlapLength                        Overlaps shorter than this length are not computed; default 500
minReadLength                           Reads shorter than this length are not loaded into the assembler; default 1000
minThreads                              Minimum number of compute threads suggested to compute the assembly
minimap                                 Path to minimap2; default 'minimap2'
objectStore                             Type of object storage used; not ready for production yet
objectStoreClient                       Path to the command line client used to access the object storage
objectStoreClientDA                     Path to the command line client used to download files from object storage
objectStoreClientUA                     Path to the command line client used to upload files to object storage
objectStoreNameSpace                    Object store parameters; specific to the type of objectStore used
objectStoreProject                      Object store project; specific to the type of objectStore used
obtErrorRate                            Stringency of overlaps to use for trimming
obtMMapBlockSize                        Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
obtMMapMerSize                          K-mer size for seeds in minmap
obtMhapBlockSize                        Number of reads per GB of memory allowed (mhapMemory)
obtMhapFilterThreshold                  Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
obtMhapFilterUnique                     Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
obtMhapMerSize                          K-mer size for seeds in mhap
obtMhapNoTf                             Expert option: True or false, do not use tf weighting, only idf of tf-idf.
obtMhapOptions                          Expert option: free-form parameters to pass to MHAP.
obtMhapOrderedMerSize                   K-mer size for second-stage filter in mhap
obtMhapSensitivity                      Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
obtMhapVersion                          Version of the MHAP jar file to use
obtOverlapper                           Which overlap algorithm to use for overlap based trimming
obtOvlErrorRate                         Overlaps at or below this error rate are used to trim reads
obtOvlFilter                            Filter overlaps based on expected kmers vs observed kmers
obtOvlFrequentMers                      Do not seed overlaps with these kmers
obtOvlHashBits                          Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per obtOvlHashBlockLength
obtOvlHashBlockLength                   Amount of sequence (bp) to load into the overlap hash table
obtOvlHashLoad                          Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
obtOvlMerDistinct                       K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
obtOvlMerSize                           K-mer size for seeds in overlaps
obtOvlMerThreshold                      K-mer frequency threshold; mers more frequent than this count are ignored
obtOvlRefBlockLength                    Amount of sequence (bp) to search against the hash table per batch
obtReAlign                              Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses obtOvlErrorRate
obtmhapConcurrency                      If grid not enabled, number of mhap overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
obtmhapMemory                           Amount of memory, in gigabytes, to use for mhap overlaps for trimming jobs
obtmhapStageSpace                       Amount of local disk space needed to stage data for mhap overlaps for trimming jobs
obtmhapThreads                          Number of threads to use for mhap overlaps for trimming jobs
obtmmapConcurrency                      If grid not enabled, number of mmap overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
obtmmapMemory                           Amount of memory, in gigabytes, to use for mmap overlaps for trimming jobs
obtmmapStageSpace                       Amount of local disk space needed to stage data for mmap overlaps for trimming jobs
obtmmapThreads                          Number of threads to use for mmap overlaps for trimming jobs
obtovlConcurrency                       If grid not enabled, number of overlaps for trimming jobs to run at the same time; default is n_proc / n_threads
obtovlMemory                            Amount of memory, in gigabytes, to use for overlaps for trimming jobs
obtovlStageSpace                        Amount of local disk space needed to stage data for overlaps for trimming jobs
obtovlThreads                           Number of threads to use for overlaps for trimming jobs
oeaBatchLength                          Number of bases per overlap error correction batch
oeaBatchSize                            Number of reads per overlap error correction batch
oeaConcurrency                          If grid not enabled, number of overlap error adjustment jobs to run at the same time; default is n_proc / n_threads
oeaMemory                               Amount of memory, in gigabytes, to use for overlap error adjustment jobs
oeaStageSpace                           Amount of local disk space needed to stage data for overlap error adjustment jobs
oeaThreads                              Number of threads to use for overlap error adjustment jobs
onFailure                               Full path to command to run on failure
onSuccess                               Full path to command to run on successful completion
ovbConcurrency                          If grid not enabled, number of overlap store bucketizing jobs to run at the same time; default is n_proc / n_threads
ovbMemory                               Amount of memory, in gigabytes, to use for overlap store bucketizing jobs
ovbStageSpace                           Amount of local disk space needed to stage data for overlap store bucketizing jobs
ovbThreads                              Number of threads to use for overlap store bucketizing jobs
ovsConcurrency                          If grid not enabled, number of overlap store sorting jobs to run at the same time; default is n_proc / n_threads
ovsMemory                               Amount of memory, in gigabytes, to use for overlap store sorting jobs
ovsStageSpace                           Amount of local disk space needed to stage data for overlap store sorting jobs
ovsThreads                              Number of threads to use for overlap store sorting jobs
preExec                                 A command line to run at the start of Canu execution scripts
purgeOverlaps                           When to delete intermediate overlap files: never, normal (default), aggressive, dangerous
rawErrorRate                            Expected fraction error in an alignment of two uncorrected reads
readSamplingBias                        Score reads as 'random * length^bias', keep the highest scoring reads
readSamplingCoverage                    DEPRECATED; use maxInputCoverage.  Discard reads to make the input be of this size
redBatchLength                          Number of bases per fragment error detection batch
redBatchSize                            Number of reads per fragment error detection batch
redConcurrency                          If grid not enabled, number of read error detection jobs to run at the same time; default is n_proc / n_threads
redMemory                               Amount of memory, in gigabytes, to use for read error detection jobs
redStageSpace                           Amount of local disk space needed to stage data for read error detection jobs
redThreads                              Number of threads to use for read error detection jobs
saveMerCounts                           Save full mer counting results, sometimes useful
saveOverlaps                            Do not remove the overlap stores.  Default: false = remove overlap stores when they're no longer needed
saveReadCorrections                     Save intermediate read correction files, almost never a good idea
saveReadHaplotypes                      Save intermediate read haplotype files, almost never a good idea
saveReads                               Save intermediate corrected and trimmed reads to asm.correctedReads.fasta.gz and asm.trimmedReads.fasta.gz
shell                                   Command interpreter to use; sh-compatible (e.g., bash), NOT C-shell (csh or tcsh); default '/bin/sh'
showNext                                Don't run any commands, just report what would run
stageDirectory                          If set, copy heavily used data to this node-local location
stopAfter                               Stop after a specific algorithm step is completed
stopOnLowCoverage                       Stop if raw, corrected or trimmed read coverage is low
trimReadsCoverage                       Minimum depth of evidence to retain bases; default '2
trimReadsOverlap                        Minimum overlap between evidence to make contiguous trim; default '500'
unitigger                               Which unitig algorithm to use; only 'bogart' supported; default 'bogart'
useGrid                                 If 'true', enable grid-based execution; if 'false', run all jobs on the local machine; if 'remote', create jobs for grid execution but do not submit; default 'true'
useGridBAT                              If 'true', run module BAT under grid control; if 'false' run locally.
useGridCNS                              If 'true', run module CNS under grid control; if 'false' run locally.
useGridCOR                              If 'true', run module COR under grid control; if 'false' run locally.
useGridCORMHAP                          If 'true', run module CORMHAP under grid control; if 'false' run locally.
useGridCORMMAP                          If 'true', run module CORMMAP under grid control; if 'false' run locally.
useGridCOROVL                           If 'true', run module COROVL under grid control; if 'false' run locally.
useGridHAP                              If 'true', run module HAP under grid control; if 'false' run locally.
useGridMERYL                            If 'true', run module MERYL under grid control; if 'false' run locally.
useGridOBTMHAP                          If 'true', run module OBTMHAP under grid control; if 'false' run locally.
useGridOBTMMAP                          If 'true', run module OBTMMAP under grid control; if 'false' run locally.
useGridOBTOVL                           If 'true', run module OBTOVL under grid control; if 'false' run locally.
useGridOEA                              If 'true', run module OEA under grid control; if 'false' run locally.
useGridOVB                              If 'true', run module OVB under grid control; if 'false' run locally.
useGridOVS                              If 'true', run module OVS under grid control; if 'false' run locally.
useGridRED                              If 'true', run module RED under grid control; if 'false' run locally.
useGridUTGMHAP                          If 'true', run module UTGMHAP under grid control; if 'false' run locally.
useGridUTGMMAP                          If 'true', run module UTGMMAP under grid control; if 'false' run locally.
useGridUTGOVL                           If 'true', run module UTGOVL under grid control; if 'false' run locally.
utgBubbleDeviation                      Overlaps this much above mean of contig will be used to identify bubbles
utgChimeraType                          When to filter reads for contig construction: none, chimera (missing middle), uncovered (missing middle or ends), deadend (missing middle or end or no neighbor) (default)
utgErrorRate                            Overlaps at or below this error rate are used to construct contigs
utgGraphDeviation                       Overlaps this much above median will not be used for initial graph construction
utgMMapBlockSize                        Number of reads per 1GB; memory * blockSize = the size of  block loaded into memory per job
utgMMapMerSize                          K-mer size for seeds in minmap
utgMhapBlockSize                        Number of reads per GB of memory allowed (mhapMemory)
utgMhapFilterThreshold                  Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted
utgMhapFilterUnique                     Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage.
utgMhapMerSize                          K-mer size for seeds in mhap
utgMhapNoTf                             Expert option: True or false, do not use tf weighting, only idf of tf-idf.
utgMhapOptions                          Expert option: free-form parameters to pass to MHAP.
utgMhapOrderedMerSize                   K-mer size for second-stage filter in mhap
utgMhapSensitivity                      Coarse sensitivity level: 'low', 'normal' or 'high'.  Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low'
utgMhapVersion                          Version of the MHAP jar file to use
utgOverlapper                           Which overlap algorithm to use for unitig construction
utgOvlErrorRate                         Overlaps at or below this error rate are used to trim reads
utgOvlFilter                            Filter overlaps based on expected kmers vs observed kmers
utgOvlFrequentMers                      Do not seed overlaps with these kmers
utgOvlHashBits                          Width of the kmer hash.  Width 22=1gb, 23=2gb, 24=4gb, 25=8gb.  Plus 10b per utgOvlHashBlockLength
utgOvlHashBlockLength                   Amount of sequence (bp) to load into the overlap hash table
utgOvlHashLoad                          Maximum hash table load.  If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75
utgOvlMerDistinct                       K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps
utgOvlMerSize                           K-mer size for seeds in overlaps
utgOvlMerThreshold                      K-mer frequency threshold; mers more frequent than this count are ignored
utgOvlRefBlockLength                    Amount of sequence (bp) to search against the hash table per batch
utgReAlign                              Refine overlaps by computing the actual alignment: 'true' or 'false'.  Not useful for overlapper=ovl.  Uses utgOvlErrorRate
utgRepeatConfusedBP                     Repeats where the next best edge is at least this many bp shorter will not be split
utgRepeatConfusedPC                     Repeats where the next best edge is at least this many percent shorter will not be split
utgRepeatDeviation                      Overlaps this much above mean unitig error rate will not be used for repeat splitting
utgmhapConcurrency                      If grid not enabled, number of mhap overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
utgmhapMemory                           Amount of memory, in gigabytes, to use for mhap overlaps for unitig construction jobs
utgmhapStageSpace                       Amount of local disk space needed to stage data for mhap overlaps for unitig construction jobs
utgmhapThreads                          Number of threads to use for mhap overlaps for unitig construction jobs
utgmmapConcurrency                      If grid not enabled, number of mmap overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
utgmmapMemory                           Amount of memory, in gigabytes, to use for mmap overlaps for unitig construction jobs
utgmmapStageSpace                       Amount of local disk space needed to stage data for mmap overlaps for unitig construction jobs
utgmmapThreads                          Number of threads to use for mmap overlaps for unitig construction jobs
utgovlConcurrency                       If grid not enabled, number of overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads
utgovlMemory                            Amount of memory, in gigabytes, to use for overlaps for unitig construction jobs
utgovlStageSpace                        Amount of local disk space needed to stage data for overlaps for unitig construction jobs
utgovlThreads                           Number of threads to use for overlaps for unitig construction jobs


Back to Top

Installation

Source code obtained from https://github.com/marbl/canu/releases/download/v2.1.1/

System

64-bit Linux