Canu-Sapelo2
Category
Bioinformatics
Program On
Sapelo2
Version
2.1.1
Author / Distributor
Description
"Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II/Sequel or Oxford Nanopore MinION). " More details are at Canu
Running Program
Version 2.1.1
To use this version, please load the module with
ml canu/2.1.1-GCCcore-8.3.0-Java-11
When you invoke canu, please use the gridOptions to pass queueing system options for the jobs the canu pipeline submits. At a minimum, please specify a partition, the number of tasks and the walltime. For example, use gridOptions = --partition=batch --ntasks=1 --cpus-per-task=4 --time=168:00:00 . The --mem option will be added automatically by the pipeline scripts.
Here is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash #SBATCH --partition=batch #SBATCH --job-name=canujobname #SBATCH --ntasks=1 #SBATCH --cpus-per-task=2 #SBATCH --time=120:00:00 #SBATCH --mem=40G cd $SLURM_SUBMIT_DIR ml canu/2.1.1-GCCcore-8.3.0-Java-11 canu gridOptions=" --partition=batch --ntasks=1 --cpus-per-task=4 --time=168:00:00 " [options]
To submit the job submission use the command:
sbatch ./sub.sh
Documentation
[shtsai@b1-24 ~]$ ml canu/2.1.1-GCCcore-8.3.0-Java-11 [shtsai@b1-24 ~]$ canu --help usage: canu [-version] [-citation] \ [-haplotype | -correct | -trim | -assemble | -trim-assemble] \ [-s <assembly-specifications-file>] \ -p <assembly-prefix> \ -d <assembly-directory> \ genomeSize=<number>[g|m|k] \ [other-options] \ [-haplotype{NAME} illumina.fastq.gz] \ [-corrected] \ [-trimmed] \ [-pacbio | -nanopore | -pacbio-hifi] file1 file2 ... example: canu -d run1 -p godzilla genomeSize=1g -nanopore-raw reads/*.fasta.gz To restrict canu to only a specific stage, use: -haplotype - generate haplotype-specific reads -correct - generate corrected reads -trim - generate trimmed reads -assemble - generate an assembly -trim-assemble - generate trimmed reads and then assemble them The assembly is computed in the -d <assembly-directory>, with output files named using the -p <assembly-prefix>. This directory is created if needed. It is not possible to run multiple assemblies in the same directory. The genome size should be your best guess of the haploid genome size of what is being assembled. It is used primarily to estimate coverage in reads, NOT as the desired assembly size. Fractional values are allowed: '4.7m' equals '4700k' equals '4700000' Some common options: useGrid=string - Run under grid control (true), locally (false), or set up for grid control but don't submit any jobs (remote) rawErrorRate=fraction-error - The allowed difference in an overlap between two raw uncorrected reads. For lower quality reads, use a higher number. The defaults are 0.300 for PacBio reads and 0.500 for Nanopore reads. correctedErrorRate=fraction-error - The allowed difference in an overlap between two corrected reads. Assemblies of low coverage or data with biological differences will benefit from a slight increase in this. Defaults are 0.045 for PacBio reads and 0.144 for Nanopore reads. gridOptions=string - Pass string to the command used to submit jobs to the grid. Can be used to set maximum run time limits. Should NOT be used to set memory limits; Canu will do that for you. minReadLength=number - Ignore reads shorter than 'number' bases long. Default: 1000. minOverlapLength=number - Ignore read-to-read overlaps shorter than 'number' bases long. Default: 500. A full list of options can be printed with '-options'. All options can be supplied in an optional sepc file with the -s option. For TrioCanu, haplotypes are specified with the -haplotype{NAME} option, with any number of haplotype-specific Illumina read files after. The {NAME} of each haplotype is free text (but only letters and numbers, please). For example: -haplotypeNANNY nanny/*gz -haplotypeBILLY billy1.fasta.gz billy2.fasta.gz Reads can be either FASTA or FASTQ format, uncompressed, or compressed with gz, bz2 or xz. Reads are specified by the technology they were generated with, and any processing performed. [processing] -corrected -trimmed [technology] -pacbio <files> -nanopore <files> -pacbio-hifi <files> Complete documentation at http://canu.readthedocs.org/en/latest/
[shtsai@b1-24 ~]$ ml canu/2.1.1-GCCcore-8.3.0-Java-11 [shtsai@b1-24 ~]$ canu -options MMapBlockSize Number of reads per 1GB; memory * blockSize = the size of block loaded into memory per job MMapMerSize K-mer size for seeds in minmap MhapBlockSize Number of reads per GB of memory allowed (mhapMemory) MhapFilterThreshold Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted MhapFilterUnique Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage. MhapMerSize K-mer size for seeds in mhap MhapNoTf Expert option: True or false, do not use tf weighting, only idf of tf-idf. MhapOptions Expert option: free-form parameters to pass to MHAP. MhapOrderedMerSize K-mer size for second-stage filter in mhap MhapSensitivity Coarse sensitivity level: 'low', 'normal' or 'high'. Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low' MhapVersion Version of the MHAP jar file to use Overlapper Which overlap algorithm to use for unitig construction OvlFilter Filter overlaps based on expected kmers vs observed kmers OvlFrequentMers Do not seed overlaps with these kmers OvlHashBits Width of the kmer hash. Width 22=1gb, 23=2gb, 24=4gb, 25=8gb. Plus 10b per utgOvlHashBlockLength OvlHashBlockLength Amount of sequence (bp) to load into the overlap hash table OvlHashLoad Maximum hash table load. If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75 OvlMerDistinct K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps OvlMerSize K-mer size for seeds in overlaps OvlMerThreshold K-mer frequency threshold; mers more frequent than this count are ignored OvlRefBlockLength Amount of sequence (bp) to search against the hash table per batch ReAlign Refine overlaps by computing the actual alignment: 'true' or 'false'. Not useful for overlapper=ovl. Uses utgOvlErrorRate batConcurrency Unused, only one process supported batMemory Approximate maximum memory usage, in gigabytes, default is the maxMemory limit batOptions Advanced options to bogart batStageSpace Amount of local disk space needed to stage data for unitig construction jobs batThreads Number of threads to use; default is the maxThreads limit cnsConcurrency If grid not enabled, number of unitig consensus jobs to run at the same time; default is n_proc / n_threads cnsConsensus Which consensus algorithm to use; 'pbdagcon' (fast, reliable); 'utgcns' (multialignment output); 'quick' (single read mosaic); default 'pbdagcon' cnsErrorRate Consensus expects alignments at about this error rate cnsMaxCoverage Limit unitig consensus to at most this coverage; default '40' = unlimited cnsMemory Amount of memory, in gigabytes, to use for unitig consensus jobs cnsPartitions Attempt to create this many consensus jobs; default '0' = based on the largest tig cnsStageSpace Amount of local disk space needed to stage data for unitig consensus jobs cnsThreads Number of threads to use for unitig consensus jobs contigFilter Parameters to filter out 'unassembled' unitigs. Five values: minReads minLength singleReadSpan lowCovFraction lowCovDepth corConcurrency If grid not enabled, number of read correction jobs to run at the same time; default is n_proc / n_threads corConsensus Which consensus algorithm to use; only 'falcon' is supported; default 'falcon' corErrorRate Only use raw alignments below this error rate to construct corrected reads corFilter Method to filter short reads from correction; 'quick' or 'expensive'; default 'expensive' corMMapBlockSize Number of reads per 1GB; memory * blockSize = the size of block loaded into memory per job corMMapMerSize K-mer size for seeds in minmap corMaxEvidenceCoverageGlobal Limit reads used for correction to supporting at most this coverage; default: '1.0x' = 1.0 * estimated coverage corMaxEvidenceCoverageLocal Limit reads being corrected to at most this much evidence coverage; default: '2.0x' = 2.0 * estimated coverage corMaxEvidenceErate Limit read correction to only overlaps at or below this fraction error; default: unlimited corMemory Amount of memory, in gigabytes, to use for read correction jobs corMhapBlockSize Number of reads per GB of memory allowed (mhapMemory) corMhapFilterThreshold Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted corMhapFilterUnique Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage. corMhapMerSize K-mer size for seeds in mhap corMhapNoTf Expert option: True or false, do not use tf weighting, only idf of tf-idf. corMhapOptions Expert option: free-form parameters to pass to MHAP. corMhapOrderedMerSize K-mer size for second-stage filter in mhap corMhapSensitivity Coarse sensitivity level: 'low', 'normal' or 'high'. Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low' corMhapVersion Version of the MHAP jar file to use corMinCoverage Minimum number of bases supporting each corrected base, if less than this sequences are split; default based on input read coverage: 0 <= 30x < 4 < 60x <= 4 corMinEvidenceLength Limit read correction to only overlaps longer than this; default: unlimited corOutCoverage Only correct the longest reads up to this coverage; default 40 corOverlapper Which overlap algorithm to use for correction corOvlErrorRate Overlaps above this error rate are not computed corOvlFilter Filter overlaps based on expected kmers vs observed kmers corOvlFrequentMers Do not seed overlaps with these kmers corOvlHashBits Width of the kmer hash. Width 22=1gb, 23=2gb, 24=4gb, 25=8gb. Plus 10b per corOvlHashBlockLength corOvlHashBlockLength Amount of sequence (bp) to load into the overlap hash table corOvlHashLoad Maximum hash table load. If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75 corOvlMerDistinct K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps corOvlMerSize K-mer size for seeds in overlaps corOvlMerThreshold K-mer frequency threshold; mers more frequent than this count are ignored corOvlRefBlockLength Amount of sequence (bp) to search against the hash table per batch corPartitionMin Don't make a read correction partition with fewer than N reads corPartitions Partition read correction into N jobs corReAlign Refine overlaps by computing the actual alignment: 'true' or 'false'. Not useful for overlapper=ovl. Uses corOvlErrorRate corStageSpace Amount of local disk space needed to stage data for read correction jobs corThreads Number of threads to use for read correction jobs cormhapConcurrency If grid not enabled, number of mhap overlaps for correction jobs to run at the same time; default is n_proc / n_threads cormhapMemory Amount of memory, in gigabytes, to use for mhap overlaps for correction jobs cormhapStageSpace Amount of local disk space needed to stage data for mhap overlaps for correction jobs cormhapThreads Number of threads to use for mhap overlaps for correction jobs cormmapConcurrency If grid not enabled, number of mmap overlaps for correction jobs to run at the same time; default is n_proc / n_threads cormmapMemory Amount of memory, in gigabytes, to use for mmap overlaps for correction jobs cormmapStageSpace Amount of local disk space needed to stage data for mmap overlaps for correction jobs cormmapThreads Number of threads to use for mmap overlaps for correction jobs corovlConcurrency If grid not enabled, number of overlaps for correction jobs to run at the same time; default is n_proc / n_threads corovlMemory Amount of memory, in gigabytes, to use for overlaps for correction jobs corovlStageSpace Amount of local disk space needed to stage data for overlaps for correction jobs corovlThreads Number of threads to use for overlaps for correction jobs correctedErrorRate Expected fraction error in an alignment of two corrected reads enableOEA Do overlap error adjustment - comprises two steps: read error detection (RED) and overlap error adjustment (OEA); default 'true' executiveMemory Amount of memory, in GB, to reserve for the Canu exective process executiveThreads Number of threads to reserve for the Canu exective process genomeSize An estimate of the size of the genome gnuplot Path to the gnuplot executable gnuplotImageFormat Image format that gnuplot will generate. Default: based on gnuplot, 'png', 'svg' or 'gif' gridEngine Grid engine configuration, not documented gridEngineArrayMaxJobs Grid engine configuration, not documented gridEngineArrayName Grid engine configuration, not documented gridEngineArrayOption Grid engine configuration, not documented gridEngineArraySubmitID Grid engine configuration, not documented gridEngineJobID Grid engine configuration, not documented gridEngineMemoryOption Grid engine configuration, not documented gridEngineMemoryPerJob Grid engine configuration, not documented gridEngineMemoryUnits Grid engine configuration, not documented gridEngineNameOption Grid engine configuration, not documented gridEngineNameToJobIDCommand Grid engine configuration, not documented gridEngineNameToJobIDCommandNoArray Grid engine configuration, not documented gridEngineOutputOption Grid engine configuration, not documented gridEngineResourceOption Grid engine configuration, not documented gridEngineStageOption Grid engine configuration, not documented gridEngineSubmitCommand Grid engine configuration, not documented gridEngineTaskID Grid engine configuration, not documented gridEngineThreadsOption Grid engine configuration, not documented gridOptions Grid engine options applied to all jobs gridOptionsExecutive Grid engine options applied to the canu executive script gridOptionsJobName Grid jobs job-name suffix gridOptionsbat Grid engine options applied to unitig construction jobs gridOptionscns Grid engine options applied to unitig consensus jobs gridOptionscor Grid engine options applied to read correction jobs gridOptionscormhap Grid engine options applied to mhap overlaps for correction jobs gridOptionscormmap Grid engine options applied to mmap overlaps for correction jobs gridOptionscorovl Grid engine options applied to overlaps for correction jobs gridOptionshap Grid engine options applied to haplotype assignment jobs gridOptionsmeryl Grid engine options applied to mer counting jobs gridOptionsobtmhap Grid engine options applied to mhap overlaps for trimming jobs gridOptionsobtmmap Grid engine options applied to mmap overlaps for trimming jobs gridOptionsobtovl Grid engine options applied to overlaps for trimming jobs gridOptionsoea Grid engine options applied to overlap error adjustment jobs gridOptionsovb Grid engine options applied to overlap store bucketizing jobs gridOptionsovs Grid engine options applied to overlap store sorting jobs gridOptionsred Grid engine options applied to read error detection jobs gridOptionsutgmhap Grid engine options applied to mhap overlaps for unitig construction jobs gridOptionsutgmmap Grid engine options applied to mmap overlaps for unitig construction jobs gridOptionsutgovl Grid engine options applied to overlaps for unitig construction jobs hapConcurrency Unused, there is only one process hapMemory Amount of memory, in gigabytes, to use for haplotype assignment hapStageSpace Amount of local disk space needed to stage data for haplotype assignment jobs hapThreads Number of threads to use for haplotype assignment hapUnknownFraction Fraction of allowed unknown bases before they are included in the assembly, between 0-1; default 0.05 homoPolyCompress Compute everything but consensus sequences using homopolymer compressed reads java Java interpreter to use; at least version 1.8; default 'java' javaUse64Bit Java interpreter supports the -d64 or -d32 flags; default auto maxInputCoverage If input coverage is high, downsample to something reasonable; default 200 maxMemory Maximum memory to use by any component of the assembler maxThreads Maximum number of compute threads to use by any component of the assembler merylConcurrency Unused, there is only one process merylMemory Amount of memory, in gigabytes, to use for mer counting merylStageSpace Amount of local disk space needed to stage data for mer counting jobs merylThreads Number of threads to use for mer counting minInputCoverage Stop if input coverage is too low; default 10 minMemory Minimum amount of memory needed to compute the assembly (do not set unless prompted!) minOverlapLength Overlaps shorter than this length are not computed; default 500 minReadLength Reads shorter than this length are not loaded into the assembler; default 1000 minThreads Minimum number of compute threads suggested to compute the assembly minimap Path to minimap2; default 'minimap2' objectStore Type of object storage used; not ready for production yet objectStoreClient Path to the command line client used to access the object storage objectStoreClientDA Path to the command line client used to download files from object storage objectStoreClientUA Path to the command line client used to upload files to object storage objectStoreNameSpace Object store parameters; specific to the type of objectStore used objectStoreProject Object store project; specific to the type of objectStore used obtErrorRate Stringency of overlaps to use for trimming obtMMapBlockSize Number of reads per 1GB; memory * blockSize = the size of block loaded into memory per job obtMMapMerSize K-mer size for seeds in minmap obtMhapBlockSize Number of reads per GB of memory allowed (mhapMemory) obtMhapFilterThreshold Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted obtMhapFilterUnique Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage. obtMhapMerSize K-mer size for seeds in mhap obtMhapNoTf Expert option: True or false, do not use tf weighting, only idf of tf-idf. obtMhapOptions Expert option: free-form parameters to pass to MHAP. obtMhapOrderedMerSize K-mer size for second-stage filter in mhap obtMhapSensitivity Coarse sensitivity level: 'low', 'normal' or 'high'. Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low' obtMhapVersion Version of the MHAP jar file to use obtOverlapper Which overlap algorithm to use for overlap based trimming obtOvlErrorRate Overlaps at or below this error rate are used to trim reads obtOvlFilter Filter overlaps based on expected kmers vs observed kmers obtOvlFrequentMers Do not seed overlaps with these kmers obtOvlHashBits Width of the kmer hash. Width 22=1gb, 23=2gb, 24=4gb, 25=8gb. Plus 10b per obtOvlHashBlockLength obtOvlHashBlockLength Amount of sequence (bp) to load into the overlap hash table obtOvlHashLoad Maximum hash table load. If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75 obtOvlMerDistinct K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps obtOvlMerSize K-mer size for seeds in overlaps obtOvlMerThreshold K-mer frequency threshold; mers more frequent than this count are ignored obtOvlRefBlockLength Amount of sequence (bp) to search against the hash table per batch obtReAlign Refine overlaps by computing the actual alignment: 'true' or 'false'. Not useful for overlapper=ovl. Uses obtOvlErrorRate obtmhapConcurrency If grid not enabled, number of mhap overlaps for trimming jobs to run at the same time; default is n_proc / n_threads obtmhapMemory Amount of memory, in gigabytes, to use for mhap overlaps for trimming jobs obtmhapStageSpace Amount of local disk space needed to stage data for mhap overlaps for trimming jobs obtmhapThreads Number of threads to use for mhap overlaps for trimming jobs obtmmapConcurrency If grid not enabled, number of mmap overlaps for trimming jobs to run at the same time; default is n_proc / n_threads obtmmapMemory Amount of memory, in gigabytes, to use for mmap overlaps for trimming jobs obtmmapStageSpace Amount of local disk space needed to stage data for mmap overlaps for trimming jobs obtmmapThreads Number of threads to use for mmap overlaps for trimming jobs obtovlConcurrency If grid not enabled, number of overlaps for trimming jobs to run at the same time; default is n_proc / n_threads obtovlMemory Amount of memory, in gigabytes, to use for overlaps for trimming jobs obtovlStageSpace Amount of local disk space needed to stage data for overlaps for trimming jobs obtovlThreads Number of threads to use for overlaps for trimming jobs oeaBatchLength Number of bases per overlap error correction batch oeaBatchSize Number of reads per overlap error correction batch oeaConcurrency If grid not enabled, number of overlap error adjustment jobs to run at the same time; default is n_proc / n_threads oeaMemory Amount of memory, in gigabytes, to use for overlap error adjustment jobs oeaStageSpace Amount of local disk space needed to stage data for overlap error adjustment jobs oeaThreads Number of threads to use for overlap error adjustment jobs onFailure Full path to command to run on failure onSuccess Full path to command to run on successful completion ovbConcurrency If grid not enabled, number of overlap store bucketizing jobs to run at the same time; default is n_proc / n_threads ovbMemory Amount of memory, in gigabytes, to use for overlap store bucketizing jobs ovbStageSpace Amount of local disk space needed to stage data for overlap store bucketizing jobs ovbThreads Number of threads to use for overlap store bucketizing jobs ovsConcurrency If grid not enabled, number of overlap store sorting jobs to run at the same time; default is n_proc / n_threads ovsMemory Amount of memory, in gigabytes, to use for overlap store sorting jobs ovsStageSpace Amount of local disk space needed to stage data for overlap store sorting jobs ovsThreads Number of threads to use for overlap store sorting jobs preExec A command line to run at the start of Canu execution scripts purgeOverlaps When to delete intermediate overlap files: never, normal (default), aggressive, dangerous rawErrorRate Expected fraction error in an alignment of two uncorrected reads readSamplingBias Score reads as 'random * length^bias', keep the highest scoring reads readSamplingCoverage DEPRECATED; use maxInputCoverage. Discard reads to make the input be of this size redBatchLength Number of bases per fragment error detection batch redBatchSize Number of reads per fragment error detection batch redConcurrency If grid not enabled, number of read error detection jobs to run at the same time; default is n_proc / n_threads redMemory Amount of memory, in gigabytes, to use for read error detection jobs redStageSpace Amount of local disk space needed to stage data for read error detection jobs redThreads Number of threads to use for read error detection jobs saveMerCounts Save full mer counting results, sometimes useful saveOverlaps Do not remove the overlap stores. Default: false = remove overlap stores when they're no longer needed saveReadCorrections Save intermediate read correction files, almost never a good idea saveReadHaplotypes Save intermediate read haplotype files, almost never a good idea saveReads Save intermediate corrected and trimmed reads to asm.correctedReads.fasta.gz and asm.trimmedReads.fasta.gz shell Command interpreter to use; sh-compatible (e.g., bash), NOT C-shell (csh or tcsh); default '/bin/sh' showNext Don't run any commands, just report what would run stageDirectory If set, copy heavily used data to this node-local location stopAfter Stop after a specific algorithm step is completed stopOnLowCoverage Stop if raw, corrected or trimmed read coverage is low trimReadsCoverage Minimum depth of evidence to retain bases; default '2 trimReadsOverlap Minimum overlap between evidence to make contiguous trim; default '500' unitigger Which unitig algorithm to use; only 'bogart' supported; default 'bogart' useGrid If 'true', enable grid-based execution; if 'false', run all jobs on the local machine; if 'remote', create jobs for grid execution but do not submit; default 'true' useGridBAT If 'true', run module BAT under grid control; if 'false' run locally. useGridCNS If 'true', run module CNS under grid control; if 'false' run locally. useGridCOR If 'true', run module COR under grid control; if 'false' run locally. useGridCORMHAP If 'true', run module CORMHAP under grid control; if 'false' run locally. useGridCORMMAP If 'true', run module CORMMAP under grid control; if 'false' run locally. useGridCOROVL If 'true', run module COROVL under grid control; if 'false' run locally. useGridHAP If 'true', run module HAP under grid control; if 'false' run locally. useGridMERYL If 'true', run module MERYL under grid control; if 'false' run locally. useGridOBTMHAP If 'true', run module OBTMHAP under grid control; if 'false' run locally. useGridOBTMMAP If 'true', run module OBTMMAP under grid control; if 'false' run locally. useGridOBTOVL If 'true', run module OBTOVL under grid control; if 'false' run locally. useGridOEA If 'true', run module OEA under grid control; if 'false' run locally. useGridOVB If 'true', run module OVB under grid control; if 'false' run locally. useGridOVS If 'true', run module OVS under grid control; if 'false' run locally. useGridRED If 'true', run module RED under grid control; if 'false' run locally. useGridUTGMHAP If 'true', run module UTGMHAP under grid control; if 'false' run locally. useGridUTGMMAP If 'true', run module UTGMMAP under grid control; if 'false' run locally. useGridUTGOVL If 'true', run module UTGOVL under grid control; if 'false' run locally. utgBubbleDeviation Overlaps this much above mean of contig will be used to identify bubbles utgChimeraType When to filter reads for contig construction: none, chimera (missing middle), uncovered (missing middle or ends), deadend (missing middle or end or no neighbor) (default) utgErrorRate Overlaps at or below this error rate are used to construct contigs utgGraphDeviation Overlaps this much above median will not be used for initial graph construction utgMMapBlockSize Number of reads per 1GB; memory * blockSize = the size of block loaded into memory per job utgMMapMerSize K-mer size for seeds in minmap utgMhapBlockSize Number of reads per GB of memory allowed (mhapMemory) utgMhapFilterThreshold Value between 0 and 1. kmers which comprise more than this percentage of the input are downweighted utgMhapFilterUnique Expert option: True or false, supress the low-frequency k-mer distribution based on them being likely noise and not true overlaps. Threshold auto-computed based on error rate and coverage. utgMhapMerSize K-mer size for seeds in mhap utgMhapNoTf Expert option: True or false, do not use tf weighting, only idf of tf-idf. utgMhapOptions Expert option: free-form parameters to pass to MHAP. utgMhapOrderedMerSize K-mer size for second-stage filter in mhap utgMhapSensitivity Coarse sensitivity level: 'low', 'normal' or 'high'. Set automatically based on coverage; 'high' <= 30x < 'normal' < 60x <= 'low' utgMhapVersion Version of the MHAP jar file to use utgOverlapper Which overlap algorithm to use for unitig construction utgOvlErrorRate Overlaps at or below this error rate are used to trim reads utgOvlFilter Filter overlaps based on expected kmers vs observed kmers utgOvlFrequentMers Do not seed overlaps with these kmers utgOvlHashBits Width of the kmer hash. Width 22=1gb, 23=2gb, 24=4gb, 25=8gb. Plus 10b per utgOvlHashBlockLength utgOvlHashBlockLength Amount of sequence (bp) to load into the overlap hash table utgOvlHashLoad Maximum hash table load. If set too high, table lookups are inefficent; if too low, search overhead dominates run time; default 0.75 utgOvlMerDistinct K-mer frequency threshold; the least frequent fraction of distinct mers can seed overlaps utgOvlMerSize K-mer size for seeds in overlaps utgOvlMerThreshold K-mer frequency threshold; mers more frequent than this count are ignored utgOvlRefBlockLength Amount of sequence (bp) to search against the hash table per batch utgReAlign Refine overlaps by computing the actual alignment: 'true' or 'false'. Not useful for overlapper=ovl. Uses utgOvlErrorRate utgRepeatConfusedBP Repeats where the next best edge is at least this many bp shorter will not be split utgRepeatConfusedPC Repeats where the next best edge is at least this many percent shorter will not be split utgRepeatDeviation Overlaps this much above mean unitig error rate will not be used for repeat splitting utgmhapConcurrency If grid not enabled, number of mhap overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads utgmhapMemory Amount of memory, in gigabytes, to use for mhap overlaps for unitig construction jobs utgmhapStageSpace Amount of local disk space needed to stage data for mhap overlaps for unitig construction jobs utgmhapThreads Number of threads to use for mhap overlaps for unitig construction jobs utgmmapConcurrency If grid not enabled, number of mmap overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads utgmmapMemory Amount of memory, in gigabytes, to use for mmap overlaps for unitig construction jobs utgmmapStageSpace Amount of local disk space needed to stage data for mmap overlaps for unitig construction jobs utgmmapThreads Number of threads to use for mmap overlaps for unitig construction jobs utgovlConcurrency If grid not enabled, number of overlaps for unitig construction jobs to run at the same time; default is n_proc / n_threads utgovlMemory Amount of memory, in gigabytes, to use for overlaps for unitig construction jobs utgovlStageSpace Amount of local disk space needed to stage data for overlaps for unitig construction jobs utgovlThreads Number of threads to use for overlaps for unitig construction jobs
Installation
Source code obtained from https://github.com/marbl/canu/releases/download/v2.1.1/
System
64-bit Linux