SSPACE-longread-Teaching

From Research Computing Center Wiki
Jump to navigation Jump to search

Category

Bioinformatics

Program On

Teaching

Version

1-1

Author / Distributor

SSPACE-longread

Description

"SSPACE-LongRead is a stand-alone program for scaffolding pre-assembled contigs using long reads (e.g. PacBio RS reads). Using the long read information, contigs (or scaffolds) are placed in the right order and orientation in so-called super-scaffolds. " More details are at SSPACE-longread

Running Program

The last version of this application is at /usr/local/apps/gb/sspace-longread/1-1

To use this version, please load the module with

ml sspace-longread/1-1 

Here is an example of a shell script, sub.sh, to run on the batch queue:

#!/bin/bash
#SBATCH --job-name=j_SSPACE-longread
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=SSPACE-longread.%j.out
#SBATCH --error=SSPACE-longread.%j.err

cd $SLURM_SUBMIT_DIR
ml sspace-longread/1-1
perl /usr/local/apps/gb/sspace-longread/1-1/SSPACE-LongRead.pl [options]

In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.

Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.


Here is an example of job submission command:

sbatch ./sub.sh 

Documentation

ml sspace-longread/1-1 
perl /usr/local/apps/gb/sspace-longread/1-1/SSPACE-LongRead.pl  -h
Usage SSPACE-LongRead scaffolder version 1-1

perl SSPACE-LongRead.pl -c <contig-sequences> -p <pacbio-reads>

General options:
-c  Fasta file containing contig sequences used for scaffolding (REQUIRED)
-p  File containing PacBio CLR sequences to be used scaffolding (REQUIRED)
-b  Output folder name where the results are stored (optional, default -b 'PacBio_scaffolder_results')

Alignment options:
-a  Minimum alignment length to allow a contig to be included for scaffolding (default -a 0, optional)
-i  Minimum identity of the alignment of the PacBio reads to the contig sequences. Alignment below this value will be filtered out (default -i 70, optional)
-t  The number of threads to run BLASR with
-g  Minimmum gap between two contigs

Scaffolding options:
-l  Minimum number of links (PacBio reads) to allow contig-pairs for scaffolding (default -k 3, optional)
-r  Maximum link ratio between two best contig pairs *higher values lead to least accurate scaffolding* (default -r 0.3, optional)
-o  Minimum overlap length to merge two contigs (default -o 10, optional)

Other options:
-k  Store inner-scaffold sequences in a file. These are the long-read sequences spanning over a contig-link (default no output, set '-k 1' to store inner-scaffold sequences. If set, a folder is generated named 'inner-scaffold-sequences'
-s  Skip the alignment step and use a previous alignment file. Note that the results of a previous run will be overwritten. Set '-s 1' to skip the alignment.
-h  Prints this help message

ERROR: Please insert a file with contig sequences. You've inserted '' which either does not exist or is not filled in

Back to Top

Installation

Source code is obtained from SSPACE-longread

System

64-bit Linux