RepeatScout-Teaching

From Research Computing Center Wiki
Revision as of 12:49, 10 August 2018 by Yhuang (talk | contribs)
Jump to navigation Jump to search

Category

Bioinformatics

Program On

Teaching

Version

1.05

Author / Distributor

RepeatScout

Description

"The purpose of the RepeatScout software is to identify repeat family sequences from genomes where hand-curated repeat databases (a laRepBase update) are not available. In fact, the output of this program can be used as input to RepeatMasker as a way of automatically masking newly-sequenced genomes." More details are at RepeatScout

Running Program

The last version of this application is at /usr/local/apps/eb/RepeatScout/1.05-foss-2016b

To use this version, please load the module with

ml RepeatScout/1.05-foss-2016b 

Here is an example of a shell script, sub.sh, to run on the batch queue:

#!/bin/bash
#SBATCH --job-name=j_RepeatScout
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=RepeatScout.%j.out

cd $SLURM_SUBMIT_DIR
ml RepeatScout/1.05-foss-2016b
RepeatScout [options]

In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.

Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.


Here is an example of job submission command:

sbatch ./sub.sh 

Documentation

ml RepeatScout/1.05-foss-2016b 
RepeatScout RepeatScout -h
RepeatScout Version 1.0.5

Usage: 
RepeatScout -sequence <seq> -output <out> -freq <freq> -l <l> [opts]
     -L # size of region to extend left or right (10000) 
     -match # reward for a match (+1)  
     -mismatch # penalty for a mismatch (-1) 
     -gap  # penalty for a gap (-5)
     -maxgap # maximum number of gaps allowed (5) 
     -maxoccurrences # cap on the number of sequences to align (10,000) 
     -maxrepeats # stop work after reporting this number of repeats (10000)
     -cappenalty # cap on penalty for exiting alignment of a sequence (-20)
     -tandemdist # of bases that must intervene between two l-mers for both to be counted (500)
     -minthresh # stop if fewer than this number of l-mers are found in the seeding phase (3)
     -minimprovement # amount that a the alignment needs to improve each step to be considered progress (3)
     -stopafter # stop the alignment after this number of no-progress columns (100)
     -goodlength # minimum required length for a sequence to be reported (50)
     -maxentropy # entropy (complexity) threshold for an l-mer to be considered (-.7)
     -v[v[v[v]]] How verbose do you want it to be?  -vvvv is super-verbose.

Back to Top

Installation

Source code is obtained from RepeatScout

System

64-bit Linux