Canu-Teaching: Difference between revisions
No edit summary |
No edit summary |
||
Line 9: | Line 9: | ||
=== Version === | === Version === | ||
1. | 1.5 | ||
=== Author / Distributor === | === Author / Distributor === | ||
Line 21: | Line 21: | ||
=== Running Program === | === Running Program === | ||
The last version of this application is at /usr/local/apps/eb/canu/1. | The last version of this application is at /usr/local/apps/eb/canu/1.5-foss-2016b | ||
To use this version, please load the module with | To use this version, please load the module with | ||
<pre class="gscript"> | <pre class="gscript"> | ||
ml canu/1. | ml canu/1.5-foss-2016b | ||
</pre> | </pre> | ||
Line 42: | Line 42: | ||
cd $SLURM_SUBMIT_DIR<br> | cd $SLURM_SUBMIT_DIR<br> | ||
ml canu/1. | ml canu/1.5-foss-2016b<br> | ||
canu <u>[options]</u><br> | canu <u>[options]</u><br> | ||
</div> | </div> | ||
Line 58: | Line 58: | ||
<pre class="gcommand"> | <pre class="gcommand"> | ||
ml canu/1. | ml canu/1.5-foss-2016b | ||
canu canu -h | canu canu -h | ||
usage: | usage: canu [-version] \ | ||
[-correct | -trim | -assemble | -trim-assemble] \ | |||
[-s <assembly-specifications-file>] \ | |||
-p <assembly-prefix> \ | |||
-d <assembly-directory> \ | |||
genomeSize=<number>[g|m|k] \ | |||
[other-options] \ | |||
[-pacbio-raw | -pacbio-corrected | -nanopore-raw | -nanopore-corrected] *fastq | |||
By default, all three stages (correct, trim, assemble) are computed. | |||
To compute only a single stage, use: | |||
To | |||
-correct - generate corrected reads | -correct - generate corrected reads | ||
-trim - generate trimmed reads | -trim - generate trimmed reads | ||
Line 82: | Line 77: | ||
-trim-assemble - generate trimmed reads and then assemble them | -trim-assemble - generate trimmed reads and then assemble them | ||
The assembly is computed in the -d <assembly-directory>, with | The assembly is computed in the (created) -d <assembly-directory>, with most | ||
files named using the -p <assembly-prefix>. | |||
The genome size | The genome size is your best guess of the genome size of what is being assembled. | ||
It is used mostly to compute coverage in reads. Fractional values are allowed: '4.7m' | |||
is the same as '4700k' and '4700000' | |||
A full list of options can be printed with '-options'. All options | |||
can be supplied in an optional sepc file. | |||
A full list of options can be printed with '-options'. All options can be supplied in | |||
Reads can be either FASTA or FASTQ format, uncompressed, or compressed with gz, bz2 or xz. | Reads can be either FASTA or FASTQ format, uncompressed, or compressed | ||
with gz, bz2 or xz. Reads are specified by the technology they were | |||
generated with: | |||
-pacbio-raw <files> | -pacbio-raw <files> | ||
-pacbio-corrected <files> | -pacbio-corrected <files> |
Revision as of 13:41, 10 August 2018
Category
Bioinformatics
Program On
Teaching
Version
1.5
Author / Distributor
Description
"Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II or Oxford Nanopore MinION). Canu is a hierarchical assembly pipeline which runs in four steps: Detect overlaps in high-noise sequences using MHAP Generate corrected sequence consensus Trim corrected sequences Assemble trimmed corrected sequences" More details are at canu
Running Program
The last version of this application is at /usr/local/apps/eb/canu/1.5-foss-2016b
To use this version, please load the module with
ml canu/1.5-foss-2016b
Here is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash
#SBATCH --job-name=j_canu
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=canu.%j.out
cd $SLURM_SUBMIT_DIR
ml canu/1.5-foss-2016b
canu [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
ml canu/1.5-foss-2016b canu canu -h usage: canu [-version] \ [-correct | -trim | -assemble | -trim-assemble] \ [-s <assembly-specifications-file>] \ -p <assembly-prefix> \ -d <assembly-directory> \ genomeSize=<number>[g|m|k] \ [other-options] \ [-pacbio-raw | -pacbio-corrected | -nanopore-raw | -nanopore-corrected] *fastq By default, all three stages (correct, trim, assemble) are computed. To compute only a single stage, use: -correct - generate corrected reads -trim - generate trimmed reads -assemble - generate an assembly -trim-assemble - generate trimmed reads and then assemble them The assembly is computed in the (created) -d <assembly-directory>, with most files named using the -p <assembly-prefix>. The genome size is your best guess of the genome size of what is being assembled. It is used mostly to compute coverage in reads. Fractional values are allowed: '4.7m' is the same as '4700k' and '4700000' A full list of options can be printed with '-options'. All options can be supplied in an optional sepc file. Reads can be either FASTA or FASTQ format, uncompressed, or compressed with gz, bz2 or xz. Reads are specified by the technology they were generated with: -pacbio-raw <files> -pacbio-corrected <files> -nanopore-raw <files> -nanopore-corrected <files> Complete documentation at http://canu.readthedocs.org/en/latest/
Installation
Source code is obtained from canu
System
64-bit Linux