DBG2OLC-Teaching

From Research Computing Center Wiki
Revision as of 14:12, 10 August 2018 by Yhuang (talk | contribs)
Jump to navigation Jump to search

Category

Bioinformatics

Program On

Teaching

Version

20170208

Author / Distributor

DBG2OLC

Description

"DBG2OLC:Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies" More details are at DBG2OLC

Running Program

The last version of this application is at /usr/local/apps/eb/DBG2OLC/20170208-foss-2016b

To use this version, please load the module with

ml DBG2OLC/20170208-foss-2016b 

Here is an example of a shell script, sub.sh, to run on the batch queue:

#!/bin/bash
#SBATCH --job-name=j_DBG2OLC
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=DBG2OLC.%j.out
#SBATCH --error=DBG2OLC.%j.err

cd $SLURM_SUBMIT_DIR
ml DBG2OLC/20170208-foss-2016b
DBG2OLC [options]

In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.

Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.


Here is an example of job submission command:

sbatch ./sub.sh 

Documentation

ml DBG2OLC/20170208-foss-2016b 
DBG2OLC --help
 Example command: 
For third-gen sequencing: DBG2OLC LD1 0 Contigs contig.fa k 17 KmerCovTh 2 MinOverlap 20 AdaptiveTh 0.005 f reads_file1.fq/fa f reads_file2.fq/fa
For sec-gen sequencing: DBG2OLC LD1 0 Contigs contig.fa k 31 KmerCovTh 0 MinOverlap 50 PathCovTh 1 f reads_file1.fq/fa f reads_file2.fq/fa
Parameters:
MinLen: min read length for a read to be used.
Contigs:  contig file to be used.
k: k-mer size.
LD: load compressed reads information. You can set to 1 if you have run the algorithm for one round and just want to fine tune the following parameters.
PARAMETERS THAT ARE CRITICAL FOR THE PERFORMANCE:
If you have high coverage, set large values to these parameters.
KmerCovTh: k-mer matching threshold for each solid contig. (suggest 2-10)
MinOverlap: min matching k-mers for each two reads. (suggest 10-150)
AdaptiveTh: [Specific for third-gen sequencing] adaptive k-mer threshold for each solid contig. (suggest 0.001-0.02)
PathCovTh: [Specific for Illumina sequencing] occurence threshold for a compressed read. (suggest 1-3)
Author: Chengxi Ye cxy@umd.edu.
last update: Jun 11, 2015.


Back to Top

Installation

Source code is obtained from DBG2OLC

System

64-bit Linux