GapFiller-Teaching

From Research Computing Center Wiki
Revision as of 15:16, 10 August 2018 by Yhuang (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Category

Bioinformatics

Program On

Teaching

Version

2.1.1

Author / Distributor

GapFiller

Description

"GapFiller is a seed-and-extend local assembler to fill the gap within paired reads. It can be used for both DNA and RNA and it has been tested on Illumina data." More details are at GapFiller

Running Program

The last version of this application is at /usr/local/apps/eb/GapFiller/2.1.1-foss-2016b

To use this version, please load the module with

ml GapFiller/2.1.1-foss-2016b 

Here is an example of a shell script, sub.sh, to run on the batch queue:

#!/bin/bash
#SBATCH --job-name=j_GapFiller
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=GapFiller.%j.out
#SBATCH --error=GapFiller.%j.err

cd $SLURM_SUBMIT_DIR
ml GapFiller/2.1.1-foss-2016b
GapFiller [options]

In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.

Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.


Here is an example of job submission command:

sbatch ./sub.sh 

Documentation

ml GapFiller/2.1.1-foss-2016b 
GapFiller --help
GapFiller version 2.1.1

Allowed options
  --help                produce help message
  --k arg               length of the word used to hash (default: 12)
  --block-length arg    length of perfect match (default: 15)
  --output-prefix arg   output files prefix (default: "GapFiller_output")
  --gz                  compress output with gzip
  --bz2                 compress output with bzip2
  --seed1 arg           seed1 fasta file (can be compressed with gzip or bzip2,
                        or a pipe)
  --seed2 arg           seed2 fasta file (can be compressed with gzip or bzip2,
                        or a pipe)
  --seed-sam arg        seed sam file sorted by ID, with header (sam or bam 
                        format; can be repeated multiple times)
  --query arg           query fasta file: use different reads for extension 
                        instead of seeds (can be compressed with gzip or bzip2,
                        or a pipe)
  --query-sam arg       query sam file: use different reads for extension 
                        instead of sam seeds (sam or bam format; can be 
                        repeated multiple times)
  --seed-ins arg        seed reads insert size
  --seed-var arg        seed reads insert variation
  --store-layout        store contigs layout (default: false)
                        
  --overlap arg         minimum suffix-prefix overlap (default: 30)
  --mismatch-rate arg   maximum number of mismatches every 100 bp (default: 5)
  --extThreshold arg    number of reads needed to extend a contig (default: 2)
  --limit arg           limits the number of extended reads (useful for tests)
  --no-read-cycle       allow reads to be used multiple times within the same 
                        contig (default: false)
  --mate-pairs          default: paired-ends
  --verbose             print a lot of information! Use with --limit option


Back to Top

Installation

Source code is obtained from GapFiller

System

64-bit Linux