GeneMarkES-Sapelo2
Category
Bioinformatics
Program On
Sapelo2
Version
4.57
Author / Distributor
Description
"Gene Prediction in Eukaryotes. Novel genomes can be analyzed by the program GeneMark-ES utilizing unsupervised training." More details are at GeneMarkES
Running Program
Also refer to Running Jobs on Sapelo2 Also refer to Run X window Jobs and Run interactive Jobs
Version 4.57
Version 4.57 is at /usr/local/apps/gb/genemarkes/4.57
Here is an example of a shell script sub.sh to run on at the batch queue:
#!/bin/bash
#SBATCH --job-name=geneMarkJob
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem10gb
#SBATCH --time=08:00:00
#SBATCH --output=RAxML.%j.out
#SBATCH --error=RAxML.%j.err
cd $SLURM_SUBMIT_DIR
module load GeneMark-ET/4.57-GCCcore-8.3.0
cp /usr/local/apps/gb/genemarkes/4.57/gm_key ~/.gm_key gmes_petap.pl [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running Jobs on Sapelo2.
Documentation
module load genemarkes/4.33 perl /usr/local/apps/gb/genemarkes/4.33/gmes_petap.pl # ------------------- Usage: /usr/local/apps/gb/genemarkes/4.33/gmes_petap.pl [options] --sequence [filename] GeneMark-ES Suite version 4.35 includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction Input sequence/s should be in FASTA format Algorithm options --ES to run self-training --fungus to run algorithm with branch point model (most useful for fungal genomes) --ET [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format) --EP [filename]; to run training with introns coordinates from protein splice alighnmnet (GFF format) --et_score [number]; 10 (default) minimum score of intron in initiation of the ET algorithm --ep_score [number]; 4 (default) minimum score of intron in initiation of the EP algorithm --evidence [filename]; to use in prediction external evidence (RNA or protein) mapped to genome --training to run only training step --prediction to run only prediction step --predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps) Sequence pre-processing options --max_contig [number]; 5000000 (default) will split input genomic sequence into contigs shorter then max_contig --min_contig [number]; 50000 (default); will ignore contigs shorter then min_contig in training --max_gap [number]; 5000 (default); will split sequence at gaps longer than max_gap Letters 'n' and 'N' are interpreted as standing within gaps --max_mask [number]; 5000 (default); will split sequence at repeats longer then max_mask Letters 'x' and 'X' are interpreted as results of hard masking of repeats --soft_mask [number] to indicate that lowercase letters stand for repeats; utilize only lowercase repeats longer than specified length Run options --cores [number]; 1 (default) to run program with multiple threads --pbs to run on cluster with PBS support --v verbose Customizing parameters: --max_intron [number]; default 10000 (3000 fungi), maximum length of intron --max_intergenic [number]; default 10000, maximum length of intergenic regions --min_gene_prediction [number]; default 300 (120 fungi) minimum allowed gene length in prediction step Developer options: --usr_cfg [filename]; to customize configuration file --ini_mod [filename]; use this file with parameters for algorithm initiation --test_set [filename]; to evaluate prediction accuracy on the given test set --key_bin --debug # -------------------
Installation
source code from GeneMarkES
System
64-bit Linux