|
|
Line 45: |
Line 45: |
| cd $SLURM_SUBMIT_DIR<br> | | cd $SLURM_SUBMIT_DIR<br> |
| ml GETHOMOLOGUES/1.7.6<br> | | ml GETHOMOLOGUES/1.7.6<br> |
| perl get_homologues.pl <u>[options]</u><br> | | perl /usr/local/apps/gb/GETHOMOLOGUES/1.7.6/get_homologues.pl <u>[options]</u><br> |
| </div> | | </div> |
| In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values. | | In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values. |
Line 61: |
Line 61: |
| <pre class="gcommand"> | | <pre class="gcommand"> |
| ml GETHOMOLOGUES/1.7.6 | | ml GETHOMOLOGUES/1.7.6 |
| perl get_homologues.pl -h | | perl /usr/local/apps/gb/GETHOMOLOGUES/1.7.6/get_homologues.pl -h |
| usage: get_homologues.pl [options]
| | [https://github.com/eead-csic-compbio/get_homologues GET_HOMOLOGUES] |
| | |
| -h this message
| |
| -v print version, credits and checks installation
| |
| -d directory with input FASTA files ( .faa / .fna ), (overrides -i,
| |
| GenBank files ( .gbk ), 1 per genome, or a subdirectory use of pre-clustered sequences
| |
| ( subdir.clusters / subdir_ ) with pre-clustered sequences ignores -c, -g)
| |
| ( .faa / .fna ); allows for new files to be added later;
| |
| creates output folder named 'directory_homologues'
| |
| -i input amino acid FASTA file with [taxon names] in headers, (required unless -d is set)
| |
| creates output folder named 'file_homologues'
| |
| | |
| Optional parameters:
| |
| -o only run BLAST/Pfam searches and exit (useful to pre-compute searches)
| |
| -c report genome composition analysis (follows order in -I file if enforced,
| |
| ignores -r,-t,-e)
| |
| -R set random seed for genome composition analysis (optional, requires -c, example -R 1234,
| |
| required for mixing -c with -c -a runs)
| |
| -s save memory by using BerkeleyDB; default parsing stores
| |
| sequence hits in RAM
| |
| -m runmode [local|cluster] (default local)
| |
| -n nb of threads for BLAST/HMMER/MCL in 'local' runmode (default=2)
| |
| -I file with .faa/.gbk files in -d to be included (takes all by default, requires -d)
| |
| | |
| Algorithms instead of default bidirectional best-hits (BDBH):
| |
| -G use COGtriangle algorithm (COGS, PubMed=20439257) (requires 3+ genomes|taxa)
| |
| -M use orthoMCL algorithm (OMCL, PubMed=12952885)
| |
| | |
| Options that control sequence similarity searches:
| |
| -X use diamond instead of blastp (optional, set threads with -n)
| |
| -C min %coverage in BLAST pairwise alignments (range [1-100],default=75)
| |
| -E max E-value (default=1e-05,max=0.01)
| |
| -D require equal Pfam domain composition (best with -m cluster or -n threads)
| |
| when defining similarity-based orthology
| |
| -S min %sequence identity in BLAST query/subj pairs (range [1-100],default=1 [BDBH|OMCL])
| |
| -N min BLAST neighborhood correlation PubMed=18475320 (range [0,1],default=0 [BDBH|OMCL])
| |
| -b compile core-genome with minimum BLAST searches (ignores -c [BDBH])
| |
| | |
| Options that control clustering:
| |
| -t report sequence clusters including at least t taxa (default t=numberOfTaxa,
| |
| t=0 reports all clusters [OMCL|COGS])
| |
| -a report clusters of sequence features in GenBank files (requires -d and .gbk files,
| |
| instead of default 'CDS' GenBank features example -a 'tRNA,rRNA',
| |
| NOTE: uses blastn instead of blastp,
| |
| ignores -g,-D)
| |
| -g report clusters of intergenic sequences flanked by ORFs (requires -d and .gbk files)
| |
| in addition to default 'CDS' clusters
| |
| -f filter by %length difference within clusters (range [1-100], by default sequence
| |
| length is not checked)
| |
| -r reference proteome .faa/.gbk file (by default takes file with
| |
| least sequences; with BDBH sets
| |
| first taxa to start adding genes)
| |
| -e exclude clusters with inparalogues (by default inparalogues are | |
| included)
| |
| -x allow sequences in multiple COG clusters (by default sequences are allocated
| |
| to single clusters [COGS])
| |
| -F orthoMCL inflation value (range [1-5], default=1.5 [OMCL])
| |
| -A calculate average identity of clustered sequences, (optional, creates tab-separated matrix,
| |
| by default uses blastp results but can use blastn with -a recommended with -t 0 [OMCL|COGS])
| |
| -P calculate percentage of conserved proteins (POCP), (optional, creates tab-separated matrix,
| |
| by default uses blastp results but can use blastn with -a recommended with -t 0 [OMCL|COGS])
| |
| -z add soft-core to genome composition analysis (optional, requires -c [OMCL|COGS])
| |
| | |
| This program uses BLAST (and optionally HMMER/Pfam) to define clusters of 'orthologous'
| |
| genomic sequences and pan/core-genome gene sets. Several algorithms are available
| |
| and search parameters are customizable. It is designed to process (in a SGE computer
| |
| cluster) files contained in a directory (-d), so that new .faa/.gbk files can be added
| |
| while conserving previous BLAST results. In general the program will try to re-use
| |
| previous results when run with the same input directory.
| |
| </pre> | | </pre> |
| [[#top|Back to Top]] | | [[#top|Back to Top]] |
Category
Bioinformatics
Program On
Teaching
Version
1.7.6
Author / Distributor
GET_HOMOLOGUES
Description
"
a versatile software package for pan-genome analysis"
More details are at GET_HOMOLOGUES
Running Program
The last version of this application is at /usr/local/apps/gb/GETHOMOLOGUES/1.7.6
To use this version, please load the module with
ml GETHOMOLOGUES/1.7.6
Here is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash
#SBATCH --job-name=j_GET_HOMOLOGUES
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=GET_HOMOLOGUES.%j.out
#SBATCH --error=GET_HOMOLOGUES.%j.err
cd $SLURM_SUBMIT_DIR
ml GETHOMOLOGUES/1.7.6
perl /usr/local/apps/gb/GETHOMOLOGUES/1.7.6/get_homologues.pl [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
ml GETHOMOLOGUES/1.7.6
perl /usr/local/apps/gb/GETHOMOLOGUES/1.7.6/get_homologues.pl -h
[https://github.com/eead-csic-compbio/get_homologues GET_HOMOLOGUES]
Back to Top
Installation
Source code is obtained from GET_HOMOLOGUES
System
64-bit Linux