Racon-Teaching

From Research Computing Center Wiki
Jump to navigation Jump to search

Category

Bioinformatics

Program On

Teaching

Version

1.4.7

Author / Distributor

Please see https://github.com/isovic/racon

Description

From https://github.com/isovic/racon: Racon is a "Consensus module for raw de novo DNA assembly of long uncorrected reads. "

Running Program

Also refer to Running Jobs on the teaching cluster


  • Version 1.4.7, compiled with GCCcore/8.2.0 toolchain, installed in /usr/local/apps/eb/Racon/1.4.7-GCCcore-8.2.0

To use this version of racon, please first load the module with

ml Racon/1.4.7-GCCcore-8.2.0


Sample job submission script (sub.sh) to run racon 1.4.7 in a batch job:

#!/bin/bash
#SBATCH --job-name=j_racon
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=2gb
#SBATCH --time=08:00:00
#SBATCH --output=Racon.%j.out
#SBATCH --error=Racon.%j.err

cd $SLURM_SUBMIT_DIR
ml Racon/1.4.7-GCCcore-8.2.0
racon -t 4 [options]

In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values. Note that if you use the racon option -t to specify using multi threads, please also request the same number of cores with --cpus-per-task. In the example above --cpu-per-tasks=4 and racon is invoked with -t 4.


Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.


Here is an example of job submission command:

sbatch ./sub.sh 

Documentation

Please see links from https://github.com/isovic/racon.

ml Racon/1.4.7-GCCcore-8.2.0 

racon -h

usage: racon [options ...] <sequences> <overlaps> <target sequences>

    <sequences>
        input file in FASTA/FASTQ format (can be compressed with gzip)
        containing sequences used for correction
    <overlaps>
        input file in MHAP/PAF/SAM format (can be compressed with gzip)
        containing overlaps between sequences and target sequences
    <target sequences>
        input file in FASTA/FASTQ format (can be compressed with gzip)
        containing sequences which will be corrected

    options:
        -u, --include-unpolished
            output unpolished target sequences
        -f, --fragment-correction
            perform fragment correction instead of contig polishing
            (overlaps file should contain dual/self overlaps!)
        -w, --window-length <int>
            default: 500
            size of window on which POA is performed
        -q, --quality-threshold <float>
            default: 10.0
            threshold for average base quality of windows used in POA
        -e, --error-threshold <float>
            default: 0.3
            maximum allowed error rate used for filtering overlaps
        --no-trimming
            disables consensus trimming at window ends
        -m, --match <int>
            default: 3
            score for matching bases
        -x, --mismatch <int>
            default: -5
            score for mismatching bases
        -g, --gap <int>
            default: -4
            gap penalty (must be negative)
        -t, --threads <int>
            default: 1
            number of threads
        --version
            prints the version number
        -h, --help
            prints the usage

Installation

System

64-bit Linux