RepeatScout-Teaching
Category
Bioinformatics
Program On
Teaching
Version
1.05
Author / Distributor
Description
"The purpose of the RepeatScout software is to identify repeat family sequences from genomes where hand-curated repeat databases (a laRepBase update) are not available. In fact, the output of this program can be used as input to RepeatMasker as a way of automatically masking newly-sequenced genomes." More details are at RepeatScout
Running Program
The last version of this application is at /usr/local/apps/eb/RepeatScout/1.05-foss-2016b
To use this version, please load the module with
ml RepeatScout/1.05-foss-2016b
Here is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash
#SBATCH --job-name=j_RepeatScout
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=RepeatScout.%j.out
cd $SLURM_SUBMIT_DIR
ml RepeatScout/1.05-foss-2016b
RepeatScout [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
ml RepeatScout/1.05-foss-2016b RepeatScout RepeatScout -h RepeatScout Version 1.0.5 Usage: RepeatScout -sequence <seq> -output <out> -freq <freq> -l <l> [opts] -L # size of region to extend left or right (10000) -match # reward for a match (+1) -mismatch # penalty for a mismatch (-1) -gap # penalty for a gap (-5) -maxgap # maximum number of gaps allowed (5) -maxoccurrences # cap on the number of sequences to align (10,000) -maxrepeats # stop work after reporting this number of repeats (10000) -cappenalty # cap on penalty for exiting alignment of a sequence (-20) -tandemdist # of bases that must intervene between two l-mers for both to be counted (500) -minthresh # stop if fewer than this number of l-mers are found in the seeding phase (3) -minimprovement # amount that a the alignment needs to improve each step to be considered progress (3) -stopafter # stop the alignment after this number of no-progress columns (100) -goodlength # minimum required length for a sequence to be reported (50) -maxentropy # entropy (complexity) threshold for an l-mer to be considered (-.7) -v[v[v[v]]] How verbose do you want it to be? -vvvv is super-verbose.
Installation
Source code is obtained from RepeatScout
System
64-bit Linux