GeneMarkES-Sapelo2: Difference between revisions
No edit summary |
No edit summary |
||
Line 44: | Line 44: | ||
'''Version 4.57''' | '''Version 4.57''' | ||
Version 4.57 is at | Version 4.57 is at /apps/eb/GeneMark-ET/4.57-GCCcore-8.3.0 It can be loaded with: | ||
module load GeneMark-ET/4.57-GCCcore-8.3.0 | module load GeneMark-ET/4.57-GCCcore-8.3.0 | ||
Line 80: | Line 80: | ||
<pre class="gcommand"> | <pre class="gcommand"> | ||
module load | module load GeneMark-ET/4.57-GCCcore-8.3.0 | ||
gmes_petap.pl | |||
# ------------------- | |||
# ------------------- | # ------------------- | ||
Usage: | Usage: /apps/eb/GeneMark-ET/4.57-GCCcore-8.3.0/gmes_petap.pl [options] --sequence [filename] | ||
GeneMark-ES Suite version 4.57_lic | |||
Suite includes GeneMark.hmm, GeneMark-ES, GeneMark-ET and GeneMark-EP algorithms. | |||
Input sequence/s should be in FASTA format. | |||
Select one of the gene prediction algorithm | |||
To run GeneMark-ES self-training algorithm | |||
--ES | |||
To run GeneMark-ET with hints from transcriptome splice alignments | |||
--ET [filename]; file with intron coordinates from RNA-Seq read splice alignment in GFF format | |||
--et_score [number]; default 10; minimum score of intron in initiation of the ET algorithm | |||
To run GeneMark-EP with hints from protein splice alignments | |||
--EP | |||
--dbep [filename]; file with protein database in FASTA format | |||
--ep_score [number,number]; default 4,0.25; minimum score of intron in initiation of the EP algorithm | |||
or | |||
--EP [filename]; file with intron coordinates from protein splice alignment in GFF format | |||
GeneMark | To run GeneMark.hmm predictions using previously derived model | ||
--predict_with [filename]; file with species specific gene prediction parameters | |||
To run ES, ET or EP with branch point model. This option is most useful for fungal genomes | |||
--fungus | |||
To run hmm, ES, ET or EP in PLUS mode (prediction with hints) | |||
--evidence [filename]; file with hints in GFF format | |||
--evidence [filename]; | |||
Masking option | |||
-- | --soft_mask [number] or [auto]; default auto; to indicate that lowercase letters stand for repeats; | ||
masks only lowercase repeats longer than specified length | |||
In 'auto' mode length is adjusted based on the size of the input genome | |||
Run options | Run options | ||
--cores [number]; 1 | --cores [number]; default 1; to run program with multiple threads | ||
--pbs to run on cluster with PBS support | --pbs to run on cluster with PBS support | ||
--v verbose | --v verbose | ||
Optional sequence pre-processing parameters | |||
-- | --max_contig [number]; default 5000000; will split input genomic sequence into contigs shorter then max_contig | ||
-- | --min_contig [number]; default 50000; will ignore contigs shorter then min_contig in training | ||
-- | --max_gap [number]; default 5000; will split sequence at gaps longer than max_gap | ||
Letters 'n' and 'N' are interpreted as standing within gaps | |||
--max_mask [number]; default 5000; will split sequence at repeats longer then max_mask | |||
Letters 'x' and 'X' are interpreted as results of hard masking of repeats | |||
Developer options | Optinal algorithm parameters | ||
--usr_cfg [filename]; | --max_intron [number]; default 10000 (3000 fungi); maximum length of intron | ||
--max_intergenic [number]; default 50000; maximum length of intergenic regions | |||
--min_contig_in_predict [number]; default 500; minimum allowed length of contig in prediction step | |||
--min_gene_in_predict [number]; default 300 (120 fungi); minimum allowed gene length in prediction step | |||
--gc_donor [value]; default 0.001; transition probability to GC donor in the range 0..1; 'auto' mode detects probability from training; 'off' switches GC donor model OFF | |||
Developer options | |||
--gc3 [number]; GC3 cutoff in training for grasses | |||
--training to run only training step of algorithms; applicable to ES, ET or EP | |||
--prediction to run only prediction step of algorithms using species parameters from previously executed training; applicable to ES, ET or EP | |||
--usr_cfg [filename]; use custom configuration from this file | |||
--ini_mod [filename]; use this file with parameters for algorithm initiation | --ini_mod [filename]; use this file with parameters for algorithm initiation | ||
--test_set [filename]; to evaluate prediction accuracy on the given test set | --test_set [filename]; to evaluate prediction accuracy on the given test set | ||
Line 128: | Line 151: | ||
--debug | --debug | ||
# ------------------- | # ------------------- | ||
</pre> | </pre> | ||
[[#top|Back to Top]] | [[#top|Back to Top]] |
Revision as of 13:22, 8 December 2020
Category
Bioinformatics
Program On
Sapelo2
Version
4.57
Author / Distributor
Description
"Gene Prediction in Eukaryotes. Novel genomes can be analyzed by the program GeneMark-ES utilizing unsupervised training." More details are at GeneMarkES
Running Program
Also refer to Running Jobs on Sapelo2 Also refer to Run X window Jobs and Run interactive Jobs
In order to use geneMarker you will need to download a key and put it into your home directory. Instructions to download the key can be found here:
https://github.com/ablab/quast/issues/97
From the above link:
1.)Go to http://exon.gatech.edu/GeneMark/license_download.cgi, fill the requested fields are read the license text. Note: you can select any tool and platform actually, e.g. GeneMark-ES / ET v.4.33 and LINUX64.
2.)After pressing "I agree ..." button you will be redirected to a download page. You will need to download the key only (either 32bit or 64bit depending on your platform, I think now everyone has 64bit), the software is in Quast package already.
3.)The key is in gzip format, you should unpack it (with gunzip) and move to ~/.gm_key (the name should be exactly like this, with a dot in the beginning).
~/ is your home directory. i.e /home/ugamyid . Once the .gm_key file has been placed there you should be able to run GeneMarker.
Version 4.57
Version 4.57 is at /apps/eb/GeneMark-ET/4.57-GCCcore-8.3.0 It can be loaded with: module load GeneMark-ET/4.57-GCCcore-8.3.0
Here is an example of a shell script sub.sh to run on at the batch queue:
#!/bin/bash
#SBATCH --job-name=geneMarkJob
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem10gb
#SBATCH --time=08:00:00
#SBATCH --output=RAxML.%j.out
#SBATCH --error=RAxML.%j.err
cd $SLURM_SUBMIT_DIR
module load GeneMark-ET/4.57-GCCcore-8.3.0
gmes_petap.pl [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running Jobs on Sapelo2.
Documentation
module load GeneMark-ET/4.57-GCCcore-8.3.0 gmes_petap.pl # ------------------- # ------------------- Usage: /apps/eb/GeneMark-ET/4.57-GCCcore-8.3.0/gmes_petap.pl [options] --sequence [filename] GeneMark-ES Suite version 4.57_lic Suite includes GeneMark.hmm, GeneMark-ES, GeneMark-ET and GeneMark-EP algorithms. Input sequence/s should be in FASTA format. Select one of the gene prediction algorithm To run GeneMark-ES self-training algorithm --ES To run GeneMark-ET with hints from transcriptome splice alignments --ET [filename]; file with intron coordinates from RNA-Seq read splice alignment in GFF format --et_score [number]; default 10; minimum score of intron in initiation of the ET algorithm To run GeneMark-EP with hints from protein splice alignments --EP --dbep [filename]; file with protein database in FASTA format --ep_score [number,number]; default 4,0.25; minimum score of intron in initiation of the EP algorithm or --EP [filename]; file with intron coordinates from protein splice alignment in GFF format To run GeneMark.hmm predictions using previously derived model --predict_with [filename]; file with species specific gene prediction parameters To run ES, ET or EP with branch point model. This option is most useful for fungal genomes --fungus To run hmm, ES, ET or EP in PLUS mode (prediction with hints) --evidence [filename]; file with hints in GFF format Masking option --soft_mask [number] or [auto]; default auto; to indicate that lowercase letters stand for repeats; masks only lowercase repeats longer than specified length In 'auto' mode length is adjusted based on the size of the input genome Run options --cores [number]; default 1; to run program with multiple threads --pbs to run on cluster with PBS support --v verbose Optional sequence pre-processing parameters --max_contig [number]; default 5000000; will split input genomic sequence into contigs shorter then max_contig --min_contig [number]; default 50000; will ignore contigs shorter then min_contig in training --max_gap [number]; default 5000; will split sequence at gaps longer than max_gap Letters 'n' and 'N' are interpreted as standing within gaps --max_mask [number]; default 5000; will split sequence at repeats longer then max_mask Letters 'x' and 'X' are interpreted as results of hard masking of repeats Optinal algorithm parameters --max_intron [number]; default 10000 (3000 fungi); maximum length of intron --max_intergenic [number]; default 50000; maximum length of intergenic regions --min_contig_in_predict [number]; default 500; minimum allowed length of contig in prediction step --min_gene_in_predict [number]; default 300 (120 fungi); minimum allowed gene length in prediction step --gc_donor [value]; default 0.001; transition probability to GC donor in the range 0..1; 'auto' mode detects probability from training; 'off' switches GC donor model OFF Developer options --gc3 [number]; GC3 cutoff in training for grasses --training to run only training step of algorithms; applicable to ES, ET or EP --prediction to run only prediction step of algorithms using species parameters from previously executed training; applicable to ES, ET or EP --usr_cfg [filename]; use custom configuration from this file --ini_mod [filename]; use this file with parameters for algorithm initiation --test_set [filename]; to evaluate prediction accuracy on the given test set --key_bin --debug # -------------------
Installation
source code from GeneMarkES
System
64-bit Linux