GeneMarkES-Teaching
Category
Bioinformatics
Program On
Teaching
Version
4.33
Author / Distributor
Description
" Gene Prediction in Eukaryotes. Novel genomes can be analyzed by the program GeneMark-ES utilizing unsupervised training." More details are at GeneMarkES
Running Program
The last version of this application is at /usr/local/apps/gb/genemarkes/4.33
To use this version, please load the module with
ml genemarkes/4.33
Here is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash
#SBATCH --job-name=j_GeneMarkES
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=GeneMarkES.%j.out
#SBATCH --error=GeneMarkES.%j.err
cd $SLURM_SUBMIT_DIR
ml genemarkes/4.33
perl /usr/local/apps/gb/genemarkes/4.33/gmes_petap.pl [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
ml genemarkes/4.33
perl /usr/local/apps/gb/genemarkes/4.33/gmes_petap.pl
# -------------------
Usage: /usr/local/apps/gb/genemarkes/4.33/gmes_petap.pl [options] --sequence [filename]
GeneMark-ES Suite version 4.35
includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction
Input sequence/s should be in FASTA format
Algorithm options
--ES to run self-training
--fungus to run algorithm with branch point model (most useful for fungal genomes)
--ET [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format)
--EP [filename]; to run training with introns coordinates from protein splice alighnmnet (GFF format)
--et_score [number]; 10 (default) minimum score of intron in initiation of the ET algorithm
--ep_score [number]; 4 (default) minimum score of intron in initiation of the EP algorithm
--evidence [filename]; to use in prediction external evidence (RNA or protein) mapped to genome
--training to run only training step
--prediction to run only prediction step
--predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps)
Sequence pre-processing options
--max_contig [number]; 5000000 (default) will split input genomic sequence into contigs shorter then max_contig
--min_contig [number]; 50000 (default); will ignore contigs shorter then min_contig in training
--max_gap [number]; 5000 (default); will split sequence at gaps longer than max_gap
Letters 'n' and 'N' are interpreted as standing within gaps
--max_mask [number]; 5000 (default); will split sequence at repeats longer then max_mask
Letters 'x' and 'X' are interpreted as results of hard masking of repeats
--soft_mask [number] to indicate that lowercase letters stand for repeats; utilize only lowercase repeats longer than specified length
Run options
--cores [number]; 1 (default) to run program with multiple threads
--pbs to run on cluster with PBS support
--v verbose
Customizing parameters:
--max_intron [number]; default 10000 (3000 fungi), maximum length of intron
--max_intergenic [number]; default 10000, maximum length of intergenic regions
--min_gene_prediction [number]; default 300 (120 fungi) minimum allowed gene length in prediction step
Developer options:
--usr_cfg [filename]; to customize configuration file
--ini_mod [filename]; use this file with parameters for algorithm initiation
--test_set [filename]; to evaluate prediction accuracy on the given test set
--key_bin
--debug
# -------------------
Installation
Source code is obtained from GeneMarkES
System
64-bit Linux