GeneMarkES-Teaching: Difference between revisions
No edit summary |
No edit summary |
||
Line 62: | Line 62: | ||
ml genemarkes/4.33 | ml genemarkes/4.33 | ||
perl /usr/local/apps/gb/genemarkes/4.33/gmes_petap.pl | perl /usr/local/apps/gb/genemarkes/4.33/gmes_petap.pl | ||
# ------------------- | |||
Usage: /usr/local/apps/gb/genemarkes/4.33/gmes_petap.pl [options] --sequence [filename] | |||
GeneMark-ES Suite version 4.35 | |||
includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction | |||
Input sequence/s should be in FASTA format | |||
Algorithm options | |||
--ES to run self-training | |||
--fungus to run algorithm with branch point model (most useful for fungal genomes) | |||
--ET [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format) | |||
--EP [filename]; to run training with introns coordinates from protein splice alighnmnet (GFF format) | |||
--et_score [number]; 10 (default) minimum score of intron in initiation of the ET algorithm | |||
--ep_score [number]; 4 (default) minimum score of intron in initiation of the EP algorithm | |||
--evidence [filename]; to use in prediction external evidence (RNA or protein) mapped to genome | |||
--training to run only training step | |||
--prediction to run only prediction step | |||
--predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps) | |||
Sequence pre-processing options | |||
--max_contig [number]; 5000000 (default) will split input genomic sequence into contigs shorter then max_contig | |||
--min_contig [number]; 50000 (default); will ignore contigs shorter then min_contig in training | |||
--max_gap [number]; 5000 (default); will split sequence at gaps longer than max_gap | |||
Letters 'n' and 'N' are interpreted as standing within gaps | |||
--max_mask [number]; 5000 (default); will split sequence at repeats longer then max_mask | |||
Letters 'x' and 'X' are interpreted as results of hard masking of repeats | |||
--soft_mask [number] to indicate that lowercase letters stand for repeats; utilize only lowercase repeats longer than specified length | |||
Run options | |||
--cores [number]; 1 (default) to run program with multiple threads | |||
--pbs to run on cluster with PBS support | |||
--v verbose | |||
Customizing parameters: | |||
--max_intron [number]; default 10000 (3000 fungi), maximum length of intron | |||
--max_intergenic [number]; default 10000, maximum length of intergenic regions | |||
--min_gene_prediction [number]; default 300 (120 fungi) minimum allowed gene length in prediction step | |||
Developer options: | |||
--usr_cfg [filename]; to customize configuration file | |||
--ini_mod [filename]; use this file with parameters for algorithm initiation | |||
--test_set [filename]; to evaluate prediction accuracy on the given test set | |||
--key_bin | |||
--debug | |||
# ------------------- | |||
</pre> | </pre> | ||
[[#top|Back to Top]] | [[#top|Back to Top]] |
Latest revision as of 12:20, 15 August 2018
Category
Bioinformatics
Program On
Teaching
Version
4.33
Author / Distributor
Description
" Gene Prediction in Eukaryotes. Novel genomes can be analyzed by the program GeneMark-ES utilizing unsupervised training." More details are at GeneMarkES
Running Program
The last version of this application is at /usr/local/apps/gb/genemarkes/4.33
To use this version, please load the module with
ml genemarkes/4.33
Here is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash
#SBATCH --job-name=j_GeneMarkES
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=GeneMarkES.%j.out
#SBATCH --error=GeneMarkES.%j.err
cd $SLURM_SUBMIT_DIR
ml genemarkes/4.33
perl /usr/local/apps/gb/genemarkes/4.33/gmes_petap.pl [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
ml genemarkes/4.33 perl /usr/local/apps/gb/genemarkes/4.33/gmes_petap.pl # ------------------- Usage: /usr/local/apps/gb/genemarkes/4.33/gmes_petap.pl [options] --sequence [filename] GeneMark-ES Suite version 4.35 includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction Input sequence/s should be in FASTA format Algorithm options --ES to run self-training --fungus to run algorithm with branch point model (most useful for fungal genomes) --ET [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format) --EP [filename]; to run training with introns coordinates from protein splice alighnmnet (GFF format) --et_score [number]; 10 (default) minimum score of intron in initiation of the ET algorithm --ep_score [number]; 4 (default) minimum score of intron in initiation of the EP algorithm --evidence [filename]; to use in prediction external evidence (RNA or protein) mapped to genome --training to run only training step --prediction to run only prediction step --predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps) Sequence pre-processing options --max_contig [number]; 5000000 (default) will split input genomic sequence into contigs shorter then max_contig --min_contig [number]; 50000 (default); will ignore contigs shorter then min_contig in training --max_gap [number]; 5000 (default); will split sequence at gaps longer than max_gap Letters 'n' and 'N' are interpreted as standing within gaps --max_mask [number]; 5000 (default); will split sequence at repeats longer then max_mask Letters 'x' and 'X' are interpreted as results of hard masking of repeats --soft_mask [number] to indicate that lowercase letters stand for repeats; utilize only lowercase repeats longer than specified length Run options --cores [number]; 1 (default) to run program with multiple threads --pbs to run on cluster with PBS support --v verbose Customizing parameters: --max_intron [number]; default 10000 (3000 fungi), maximum length of intron --max_intergenic [number]; default 10000, maximum length of intergenic regions --min_gene_prediction [number]; default 300 (120 fungi) minimum allowed gene length in prediction step Developer options: --usr_cfg [filename]; to customize configuration file --ini_mod [filename]; use this file with parameters for algorithm initiation --test_set [filename]; to evaluate prediction accuracy on the given test set --key_bin --debug # -------------------
Installation
Source code is obtained from GeneMarkES
System
64-bit Linux