GET HOMOLOGUES-Teaching: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
(Created page with "Category:TeachingCategory:SoftwareCategory:Bioinformatics === Category === Bioinformatics === Program On === Teaching === Version === 1.7.6 === A...")
 
No edit summary
 
(10 intermediate revisions by 2 users not shown)
Line 12: Line 12:
   
   
=== Author / Distributor ===
=== Author / Distributor ===
[https://github.com/eead-csic-compbio/get_homologues GET_HOMOLOGUES]
[https://github.com/eead-csic-compbio/get_homologues GET_HOMOLOGUES]
   
   
=== Description ===
=== Description ===
"
"a versatile software package for pan-genome analysis.". More details are at [https://github.com/eead-csic-compbio/get_homologues GET_HOMOLOGUES]
a versatile software package for pan-genome analysis"
More details are at [https://github.com/eead-csic-compbio/get_homologues GET_HOMOLOGUES]


=== Running Program ===
=== Running Program ===


The last version of this application is at /usr/local/apps/gb/GETHOMOLOGUES/1.7.6
* Version 1.7.6, installed at /usr/local/apps/gb/GETHOMOLOGUES/1.7.6


To use this version, please load the module with
To use this version, please load the module with
<pre class="gscript">
<pre class="gscript">
ml GETHOMOLOGUES/1.7.6  
ml GETHOMOLOGUES/1.7.6  
</pre>  
</pre>


Here is an example of a shell script, sub.sh, to run on the batch queue:  
ere is an example of a shell script, sub.sh, to run on the batch queue:  


<div class="gscript2">
<div class="gscript2">
<nowiki>#</nowiki>!/bin/bash<br>
<nowiki>#</nowiki>!/bin/bash<br>
<nowiki>#</nowiki>SBATCH --job-name=j_GET_HOMOLOGUES<br>  
<nowiki>#</nowiki>SBATCH --job-name=j_GLIMMER<br>  
<nowiki>#</nowiki>SBATCH --partition=batch<br>         
<nowiki>#</nowiki>SBATCH --partition=batch<br>         
<nowiki>#</nowiki>SBATCH --mail-type=ALL<br>  
<nowiki>#</nowiki>SBATCH --mail-type=ALL<br>  
Line 40: Line 37:
<nowiki>#</nowiki>SBATCH --mem=<u>10gb</u><br>     
<nowiki>#</nowiki>SBATCH --mem=<u>10gb</u><br>     
<nowiki>#</nowiki>SBATCH --time=<u>08:00:00</u><br>   
<nowiki>#</nowiki>SBATCH --time=<u>08:00:00</u><br>   
<nowiki>#</nowiki>SBATCH --output=GET_HOMOLOGUES.%j.out<br>
<nowiki>#</nowiki>SBATCH --output=GLIMMER.%j.out<br>
<nowiki>#</nowiki>SBATCH --error=GET_HOMOLOGUES.%j.err<br>
<nowiki>#</nowiki>SBATCH --error=GLIMMER.%j.err<br>
   
   
cd $SLURM_SUBMIT_DIR<br>
cd $SLURM_SUBMIT_DIR<br>
ml GETHOMOLOGUES/1.7.6<br>   
ml GETHOMOLOGUES/1.7.6
perl get_homologues.pl <u>[options]</u><br> 
 
perl /usr/local/apps/gb/GETHOMOLOGUES/1.7.6/get_homologues.pl [options]  
</div>
</div>
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.   
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.   


Please refer to [[Running_Jobs_on_the_teaching_cluster]], [[Running_Jobs_on_the_teaching_cluster#Running_an_X-windows_application | Run X window Jobs]] and [[Running_Jobs_on_the_teaching_cluster#How_to_open_an_interactive_session | Run interactive Jobs]] for more details of running jobs at Teaching cluster.
Please refer to [[Running_Jobs_on_the_teaching_cluster]], [[Running_Jobs_on_the_teaching_cluster#Running_an_X-windows_application | Run X window Jobs]] and [[Running_Jobs_on_the_teaching_cluster#How_to_open_an_interactive_session | Run interactive Jobs]] for more details of running jobs at Teaching cluster.


Here is an example of job submission command:
Here is an example of job submission command:
Line 61: Line 59:
<pre  class="gcommand">
<pre  class="gcommand">
ml GETHOMOLOGUES/1.7.6  
ml GETHOMOLOGUES/1.7.6  
perl get_homologues.pl -h
perl /usr/local/apps/gb/GETHOMOLOGUES/1.7.6/get_homologues.pl -h
usage: get_homologues.pl [options]
 
usage: /usr/local/apps/gb/GETHOMOLOGUES/1.7.6/get_homologues.pl [options]


-h this message
-h this message
Line 132: Line 131:
  previous results when run with the same input directory.
  previous results when run with the same input directory.
</pre>
</pre>
[[#top|Back to Top]]
[[#top|Back to Top]]


=== Installation ===
=== Installation ===
Source code is obtained from [https://github.com/eead-csic-compbio/get_homologues/releases GET_HOMOLOGUES]
Source code is obtained from [https://github.com/eead-csic-compbio/get_homologues/releases GET_HOMOLOGUES]
   
   
=== System ===
=== System ===
64-bit Linux
64-bit Linux

Latest revision as of 11:04, 30 August 2019

Category

Bioinformatics

Program On

Teaching

Version

1.7.6

Author / Distributor

GET_HOMOLOGUES

Description

"a versatile software package for pan-genome analysis.". More details are at GET_HOMOLOGUES

Running Program

  • Version 1.7.6, installed at /usr/local/apps/gb/GETHOMOLOGUES/1.7.6

To use this version, please load the module with

ml GETHOMOLOGUES/1.7.6 

ere is an example of a shell script, sub.sh, to run on the batch queue:

#!/bin/bash
#SBATCH --job-name=j_GLIMMER
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=GLIMMER.%j.out
#SBATCH --error=GLIMMER.%j.err

cd $SLURM_SUBMIT_DIR
ml GETHOMOLOGUES/1.7.6

perl /usr/local/apps/gb/GETHOMOLOGUES/1.7.6/get_homologues.pl [options]

In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.

Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.

Here is an example of job submission command:

sbatch ./sub.sh 

Documentation

ml GETHOMOLOGUES/1.7.6 
perl /usr/local/apps/gb/GETHOMOLOGUES/1.7.6/get_homologues.pl -h

usage: /usr/local/apps/gb/GETHOMOLOGUES/1.7.6/get_homologues.pl [options]

-h this message
-v print version, credits and checks installation
-d directory with input FASTA files ( .faa / .fna ),           (overrides -i,
   GenBank files ( .gbk ), 1 per genome, or a subdirectory      use of pre-clustered sequences
   ( subdir.clusters / subdir_ ) with pre-clustered sequences   ignores -c, -g)
   ( .faa / .fna ); allows for new files to be added later;    
   creates output folder named 'directory_homologues'
-i input amino acid FASTA file with [taxon names] in headers,  (required unless -d is set)
   creates output folder named 'file_homologues'

Optional parameters:
-o only run BLAST/Pfam searches and exit                       (useful to pre-compute searches)
-c report genome composition analysis                          (follows order in -I file if enforced,
                                                                ignores -r,-t,-e)
-R set random seed for genome composition analysis             (optional, requires -c, example -R 1234,
                                                                required for mixing -c with -c -a runs)
-s save memory by using BerkeleyDB; default parsing stores
   sequence hits in RAM
-m runmode [local|cluster]                                     (default local)
-n nb of threads for BLAST/HMMER/MCL in 'local' runmode        (default=2)
-I file with .faa/.gbk files in -d to be included              (takes all by default, requires -d)

Algorithms instead of default bidirectional best-hits (BDBH):
-G use COGtriangle algorithm (COGS, PubMed=20439257)           (requires 3+ genomes|taxa)
-M use orthoMCL algorithm (OMCL, PubMed=12952885)

Options that control sequence similarity searches:
-X use diamond instead of blastp                               (optional, set threads with -n)
-C min %coverage in BLAST pairwise alignments                  (range [1-100],default=75)
-E max E-value                                                 (default=1e-05,max=0.01)
-D require equal Pfam domain composition                       (best with -m cluster or -n threads)
   when defining similarity-based orthology
-S min %sequence identity in BLAST query/subj pairs            (range [1-100],default=1 [BDBH|OMCL])
-N min BLAST neighborhood correlation PubMed=18475320          (range [0,1],default=0 [BDBH|OMCL])
-b compile core-genome with minimum BLAST searches             (ignores -c [BDBH])

Options that control clustering:
-t report sequence clusters including at least t taxa          (default t=numberOfTaxa,
                                                                t=0 reports all clusters [OMCL|COGS])
-a report clusters of sequence features in GenBank files       (requires -d and .gbk files,
   instead of default 'CDS' GenBank features                    example -a 'tRNA,rRNA',
                                                                NOTE: uses blastn instead of blastp,
                                                                ignores -g,-D)
-g report clusters of intergenic sequences flanked by ORFs     (requires -d and .gbk files)
   in addition to default 'CDS' clusters
-f filter by %length difference within clusters                (range [1-100], by default sequence
                                                                length is not checked)
-r reference proteome .faa/.gbk file                           (by default takes file with
                                                                least sequences; with BDBH sets
                                                                first taxa to start adding genes)
-e exclude clusters with inparalogues                          (by default inparalogues are
                                                                included)
-x allow sequences in multiple COG clusters                    (by default sequences are allocated
                                                                to single clusters [COGS])
-F orthoMCL inflation value                                    (range [1-5], default=1.5 [OMCL])
-A calculate average identity of clustered sequences,          (optional, creates tab-separated matrix,
 by default uses blastp results but can use blastn with -a      recommended with -t 0 [OMCL|COGS])
-P calculate percentage of conserved proteins (POCP),          (optional, creates tab-separated matrix,
 by default uses blastp results but can use blastn with -a      recommended with -t 0 [OMCL|COGS])
-z add soft-core to genome composition analysis                (optional, requires -c [OMCL|COGS])

 This program uses BLAST (and optionally HMMER/Pfam) to define clusters of 'orthologous'
 genomic sequences and pan/core-genome gene sets. Several algorithms are available
 and search parameters are customizable. It is designed to process (in a SGE computer
 cluster) files contained in a directory (-d), so that new .faa/.gbk files can be added
 while conserving previous BLAST results. In general the program will try to re-use
 previous results when run with the same input directory.

Back to Top

Installation

Source code is obtained from GET_HOMOLOGUES

System

64-bit Linux