RepeatScout-Teaching
Category
Bioinformatics
Program On
Teaching
Version
1.05
Author / Distributor
Description
"The purpose of the RepeatScout software is to identify repeat family sequences from genomes where hand-curated repeat databases (a laRepBase update) are not available. In fact, the output of this program can be used as input to RepeatMasker as a way of automatically masking newly-sequenced genomes." More details are at RepeatScout
Running Program
The last version of this application is at /usr/local/apps/eb/RepeatScout/1.05-foss-2016b
To use this version, please load the module with
ml RepeatScout/1.05-foss-2016b
Here is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash
#SBATCH --job-name=j_RepeatScout
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=RepeatScout.%j.out
#SBATCH --error=RepeatScout.%j.err
cd $SLURM_SUBMIT_DIR
ml RepeatScout/1.05-foss-2016b
RepeatScout [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
ml RepeatScout/1.05-foss-2016b
RepeatScout -h
RepeatScout Version 1.0.5
Usage:
RepeatScout -sequence <seq> -output <out> -freq <freq> -l <l> [opts]
-L # size of region to extend left or right (10000)
-match # reward for a match (+1)
-mismatch # penalty for a mismatch (-1)
-gap # penalty for a gap (-5)
-maxgap # maximum number of gaps allowed (5)
-maxoccurrences # cap on the number of sequences to align (10,000)
-maxrepeats # stop work after reporting this number of repeats (10000)
-cappenalty # cap on penalty for exiting alignment of a sequence (-20)
-tandemdist # of bases that must intervene between two l-mers for both to be counted (500)
-minthresh # stop if fewer than this number of l-mers are found in the seeding phase (3)
-minimprovement # amount that a the alignment needs to improve each step to be considered progress (3)
-stopafter # stop the alignment after this number of no-progress columns (100)
-goodlength # minimum required length for a sequence to be reported (50)
-maxentropy # entropy (complexity) threshold for an l-mer to be considered (-.7)
-v[v[v[v]]] How verbose do you want it to be? -vvvv is super-verbose.
Installation
Source code is obtained from RepeatScout
System
64-bit Linux