GeneMarkES-Sapelo2: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
(Created page with "Category:Sapelo2oldCategory:SoftwareCategory:Bioinformatics === Category === Bioinformatics === Program On === Sapelo2 === Version === 4.57 === Author / Dis...")
 
No edit summary
Line 46: Line 46:




Here is an example of job submission
<pre  class="gcommand">
qsub  ./sub.sh
</pre>


=== Documentation ===
=== Documentation ===

Revision as of 18:47, 3 December 2020

Category

Bioinformatics

Program On

Sapelo2

Version

4.57

Author / Distributor

GeneMarkES

Description

"Gene Prediction in Eukaryotes. Novel genomes can be analyzed by the program GeneMark-ES utilizing unsupervised training." More details are at GeneMarkES

Running Program

Also refer to Running Jobs on Sapelo2 Also refer to Run X window Jobs and Run interactive Jobs


Version 4.57

Version 4.57 is at /usr/local/apps/gb/genemarkes/4.57

Here is an example of a shell script sub.sh to run on at the batch queue:

#PBS -S /bin/bash
#PBS -N j_GeneMarkES
#PBS -q batch
#PBS -l nodes=1:ppn=1
#PBS -l walltime=48:00:00
#PBS -l mem=10gb

cd $PBS_O_WORKDIR
module load genemarkes/4.57-foss-2018a
cp /usr/local/apps/gb/genemarkes/4.57/gm_key ~/.gm_key
gmes_petap.pl [options]   


Documentation

module load genemarkes/4.33
perl /usr/local/apps/gb/genemarkes/4.33/gmes_petap.pl 
# -------------------
Usage:  /usr/local/apps/gb/genemarkes/4.33/gmes_petap.pl  [options]  --sequence [filename]

GeneMark-ES Suite version 4.35
   includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction

Input sequence/s should be in FASTA format

Algorithm options
  --ES           to run self-training
  --fungus       to run algorithm with branch point model (most useful for fungal genomes)
  --ET           [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format)
  --EP           [filename]; to run training with introns coordinates from protein splice alighnmnet (GFF format)
  --et_score     [number]; 10 (default) minimum score of intron in initiation of the ET algorithm
  --ep_score     [number]; 4 (default) minimum score of intron in initiation of the EP algorithm
  --evidence     [filename]; to use in prediction external evidence (RNA or protein) mapped to genome
  --training     to run only training step
  --prediction   to run only prediction step
  --predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps)

Sequence pre-processing options
  --max_contig   [number]; 5000000 (default) will split input genomic sequence into contigs shorter then max_contig
  --min_contig   [number]; 50000 (default); will ignore contigs shorter then min_contig in training 
  --max_gap      [number]; 5000 (default); will split sequence at gaps longer than max_gap
                 Letters 'n' and 'N' are interpreted as standing within gaps 
  --max_mask     [number]; 5000 (default); will split sequence at repeats longer then max_mask
                 Letters 'x' and 'X' are interpreted as results of hard masking of repeats
  --soft_mask    [number] to indicate that lowercase letters stand for repeats; utilize only lowercase repeats longer than specified length

Run options
  --cores        [number]; 1 (default) to run program with multiple threads 
  --pbs          to run on cluster with PBS support
  --v            verbose

Customizing parameters:
  --max_intron          [number]; default 10000 (3000 fungi), maximum length of intron
  --max_intergenic      [number]; default 10000, maximum length of intergenic regions
  --min_gene_prediction [number]; default 300 (120 fungi) minimum allowed gene length in prediction step

Developer options:
  --usr_cfg      [filename]; to customize configuration file
  --ini_mod      [filename]; use this file with parameters for algorithm initiation
  --test_set     [filename]; to evaluate prediction accuracy on the given test set
  --key_bin
  --debug
# -------------------

Back to Top

Installation

source code from GeneMarkES

System

64-bit Linux