GeneMarkES-Sapelo2
Jump to navigation
Jump to search
Category
Bioinformatics
Program On
Sapelo2
Version
4.57
Author / Distributor
Description
"Gene Prediction in Eukaryotes. Novel genomes can be analyzed by the program GeneMark-ES utilizing unsupervised training." More details are at GeneMarkES
Running Program
Also refer to Running Jobs on Sapelo2 Also refer to Run X window Jobs and Run interactive Jobs
Version 4.57
Version 4.57 is at /usr/local/apps/gb/genemarkes/4.57
Here is an example of a shell script sub.sh to run on at the batch queue:
#PBS -S /bin/bash #PBS -N j_GeneMarkES #PBS -q batch #PBS -l nodes=1:ppn=1 #PBS -l walltime=48:00:00 #PBS -l mem=10gb cd $PBS_O_WORKDIR module load genemarkes/4.57-foss-2018a cp /usr/local/apps/gb/genemarkes/4.57/gm_key ~/.gm_key gmes_petap.pl [options]
Documentation
module load genemarkes/4.33
perl /usr/local/apps/gb/genemarkes/4.33/gmes_petap.pl
# -------------------
Usage: /usr/local/apps/gb/genemarkes/4.33/gmes_petap.pl [options] --sequence [filename]
GeneMark-ES Suite version 4.35
includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction
Input sequence/s should be in FASTA format
Algorithm options
--ES to run self-training
--fungus to run algorithm with branch point model (most useful for fungal genomes)
--ET [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format)
--EP [filename]; to run training with introns coordinates from protein splice alighnmnet (GFF format)
--et_score [number]; 10 (default) minimum score of intron in initiation of the ET algorithm
--ep_score [number]; 4 (default) minimum score of intron in initiation of the EP algorithm
--evidence [filename]; to use in prediction external evidence (RNA or protein) mapped to genome
--training to run only training step
--prediction to run only prediction step
--predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps)
Sequence pre-processing options
--max_contig [number]; 5000000 (default) will split input genomic sequence into contigs shorter then max_contig
--min_contig [number]; 50000 (default); will ignore contigs shorter then min_contig in training
--max_gap [number]; 5000 (default); will split sequence at gaps longer than max_gap
Letters 'n' and 'N' are interpreted as standing within gaps
--max_mask [number]; 5000 (default); will split sequence at repeats longer then max_mask
Letters 'x' and 'X' are interpreted as results of hard masking of repeats
--soft_mask [number] to indicate that lowercase letters stand for repeats; utilize only lowercase repeats longer than specified length
Run options
--cores [number]; 1 (default) to run program with multiple threads
--pbs to run on cluster with PBS support
--v verbose
Customizing parameters:
--max_intron [number]; default 10000 (3000 fungi), maximum length of intron
--max_intergenic [number]; default 10000, maximum length of intergenic regions
--min_gene_prediction [number]; default 300 (120 fungi) minimum allowed gene length in prediction step
Developer options:
--usr_cfg [filename]; to customize configuration file
--ini_mod [filename]; use this file with parameters for algorithm initiation
--test_set [filename]; to evaluate prediction accuracy on the given test set
--key_bin
--debug
# -------------------
Installation
source code from GeneMarkES
System
64-bit Linux