GeneMarkES-Sapelo2
Jump to navigation
Jump to search
Category
Bioinformatics
Program On
Sapelo2
Version
4.57
Author / Distributor
Description
"Gene Prediction in Eukaryotes. Novel genomes can be analyzed by the program GeneMark-ES utilizing unsupervised training." More details are at GeneMarkES
Running Program
Also refer to Running Jobs on Sapelo2 Also refer to Run X window Jobs and Run interactive Jobs
Version 4.57
Version 4.57 is at /usr/local/apps/gb/genemarkes/4.57
Here is an example of a shell script sub.sh to run on at the batch queue:
#PBS -S /bin/bash #PBS -N j_GeneMarkES #PBS -q batch #PBS -l nodes=1:ppn=1 #PBS -l walltime=48:00:00 #PBS -l mem=10gb cd $PBS_O_WORKDIR module load genemarkes/4.57-foss-2018a cp /usr/local/apps/gb/genemarkes/4.57/gm_key ~/.gm_key gmes_petap.pl [options]
Documentation
module load genemarkes/4.33 perl /usr/local/apps/gb/genemarkes/4.33/gmes_petap.pl # ------------------- Usage: /usr/local/apps/gb/genemarkes/4.33/gmes_petap.pl [options] --sequence [filename] GeneMark-ES Suite version 4.35 includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction Input sequence/s should be in FASTA format Algorithm options --ES to run self-training --fungus to run algorithm with branch point model (most useful for fungal genomes) --ET [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format) --EP [filename]; to run training with introns coordinates from protein splice alighnmnet (GFF format) --et_score [number]; 10 (default) minimum score of intron in initiation of the ET algorithm --ep_score [number]; 4 (default) minimum score of intron in initiation of the EP algorithm --evidence [filename]; to use in prediction external evidence (RNA or protein) mapped to genome --training to run only training step --prediction to run only prediction step --predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps) Sequence pre-processing options --max_contig [number]; 5000000 (default) will split input genomic sequence into contigs shorter then max_contig --min_contig [number]; 50000 (default); will ignore contigs shorter then min_contig in training --max_gap [number]; 5000 (default); will split sequence at gaps longer than max_gap Letters 'n' and 'N' are interpreted as standing within gaps --max_mask [number]; 5000 (default); will split sequence at repeats longer then max_mask Letters 'x' and 'X' are interpreted as results of hard masking of repeats --soft_mask [number] to indicate that lowercase letters stand for repeats; utilize only lowercase repeats longer than specified length Run options --cores [number]; 1 (default) to run program with multiple threads --pbs to run on cluster with PBS support --v verbose Customizing parameters: --max_intron [number]; default 10000 (3000 fungi), maximum length of intron --max_intergenic [number]; default 10000, maximum length of intergenic regions --min_gene_prediction [number]; default 300 (120 fungi) minimum allowed gene length in prediction step Developer options: --usr_cfg [filename]; to customize configuration file --ini_mod [filename]; use this file with parameters for algorithm initiation --test_set [filename]; to evaluate prediction accuracy on the given test set --key_bin --debug # -------------------
Installation
source code from GeneMarkES
System
64-bit Linux