DBG2OLC-Teaching: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 44: Line 44:
cd $SLURM_SUBMIT_DIR<br>
cd $SLURM_SUBMIT_DIR<br>
ml DBG2OLC/20170208-foss-2016b<br>     
ml DBG2OLC/20170208-foss-2016b<br>     
NA <u>[options]</u><br>   
DBG2OLC <u>[options]</u><br>   
</div>
</div>
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.   
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.   
Line 60: Line 60:
<pre  class="gcommand">
<pre  class="gcommand">
ml DBG2OLC/20170208-foss-2016b  
ml DBG2OLC/20170208-foss-2016b  
NA
DBG2OLC --help
Example command:
For third-gen sequencing: DBG2OLC LD1 0 Contigs contig.fa k 17 KmerCovTh 2 MinOverlap 20 AdaptiveTh 0.005 f reads_file1.fq/fa f reads_file2.fq/fa
For sec-gen sequencing: DBG2OLC LD1 0 Contigs contig.fa k 31 KmerCovTh 0 MinOverlap 50 PathCovTh 1 f reads_file1.fq/fa f reads_file2.fq/fa
Parameters:
MinLen: min read length for a read to be used.
Contigs:  contig file to be used.
k: k-mer size.
LD: load compressed reads information. You can set to 1 if you have run the algorithm for one round and just want to fine tune the following parameters.
PARAMETERS THAT ARE CRITICAL FOR THE PERFORMANCE:
If you have high coverage, set large values to these parameters.
KmerCovTh: k-mer matching threshold for each solid contig. (suggest 2-10)
MinOverlap: min matching k-mers for each two reads. (suggest 10-150)
AdaptiveTh: [Specific for third-gen sequencing] adaptive k-mer threshold for each solid contig. (suggest 0.001-0.02)
PathCovTh: [Specific for Illumina sequencing] occurence threshold for a compressed read. (suggest 1-3)
Author: Chengxi Ye cxy@umd.edu.
last update: Jun 11, 2015.
 


</pre>
</pre>

Revision as of 15:12, 10 August 2018

Category

Bioinformatics

Program On

Teaching

Version

20170208

Author / Distributor

DBG2OLC

Description

"DBG2OLC:Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies" More details are at DBG2OLC

Running Program

The last version of this application is at /usr/local/apps/eb/DBG2OLC/20170208-foss-2016b

To use this version, please load the module with

ml DBG2OLC/20170208-foss-2016b 

Here is an example of a shell script, sub.sh, to run on the batch queue:

#!/bin/bash
#SBATCH --job-name=j_DBG2OLC
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=DBG2OLC.%j.out
#SBATCH --error=DBG2OLC.%j.err

cd $SLURM_SUBMIT_DIR
ml DBG2OLC/20170208-foss-2016b
DBG2OLC [options]

In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.

Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.


Here is an example of job submission command:

sbatch ./sub.sh 

Documentation

ml DBG2OLC/20170208-foss-2016b 
DBG2OLC --help
 Example command: 
For third-gen sequencing: DBG2OLC LD1 0 Contigs contig.fa k 17 KmerCovTh 2 MinOverlap 20 AdaptiveTh 0.005 f reads_file1.fq/fa f reads_file2.fq/fa
For sec-gen sequencing: DBG2OLC LD1 0 Contigs contig.fa k 31 KmerCovTh 0 MinOverlap 50 PathCovTh 1 f reads_file1.fq/fa f reads_file2.fq/fa
Parameters:
MinLen: min read length for a read to be used.
Contigs:  contig file to be used.
k: k-mer size.
LD: load compressed reads information. You can set to 1 if you have run the algorithm for one round and just want to fine tune the following parameters.
PARAMETERS THAT ARE CRITICAL FOR THE PERFORMANCE:
If you have high coverage, set large values to these parameters.
KmerCovTh: k-mer matching threshold for each solid contig. (suggest 2-10)
MinOverlap: min matching k-mers for each two reads. (suggest 10-150)
AdaptiveTh: [Specific for third-gen sequencing] adaptive k-mer threshold for each solid contig. (suggest 0.001-0.02)
PathCovTh: [Specific for Illumina sequencing] occurence threshold for a compressed read. (suggest 1-3)
Author: Chengxi Ye cxy@umd.edu.
last update: Jun 11, 2015.


Back to Top

Installation

Source code is obtained from DBG2OLC

System

64-bit Linux