GET HOMOLOGUES-Teaching: Difference between revisions
(Created page with "Category:TeachingCategory:SoftwareCategory:Bioinformatics === Category === Bioinformatics === Program On === Teaching === Version === 1.7.6 === A...") |
No edit summary |
||
(10 intermediate revisions by 2 users not shown) | |||
Line 12: | Line 12: | ||
=== Author / Distributor === | === Author / Distributor === | ||
[https://github.com/eead-csic-compbio/get_homologues GET_HOMOLOGUES] | [https://github.com/eead-csic-compbio/get_homologues GET_HOMOLOGUES] | ||
=== Description === | === Description === | ||
" | "a versatile software package for pan-genome analysis.". More details are at [https://github.com/eead-csic-compbio/get_homologues GET_HOMOLOGUES] | ||
a versatile software package for pan-genome analysis" | |||
More details are at [https://github.com/eead-csic-compbio/get_homologues GET_HOMOLOGUES] | |||
=== Running Program === | === Running Program === | ||
* Version 1.7.6, installed at /usr/local/apps/gb/GETHOMOLOGUES/1.7.6 | |||
To use this version, please load the module with | To use this version, please load the module with | ||
<pre class="gscript"> | <pre class="gscript"> | ||
ml GETHOMOLOGUES/1.7.6 | ml GETHOMOLOGUES/1.7.6 | ||
</pre> | </pre> | ||
ere is an example of a shell script, sub.sh, to run on the batch queue: | |||
<div class="gscript2"> | <div class="gscript2"> | ||
<nowiki>#</nowiki>!/bin/bash<br> | <nowiki>#</nowiki>!/bin/bash<br> | ||
<nowiki>#</nowiki>SBATCH --job-name= | <nowiki>#</nowiki>SBATCH --job-name=j_GLIMMER<br> | ||
<nowiki>#</nowiki>SBATCH --partition=batch<br> | <nowiki>#</nowiki>SBATCH --partition=batch<br> | ||
<nowiki>#</nowiki>SBATCH --mail-type=ALL<br> | <nowiki>#</nowiki>SBATCH --mail-type=ALL<br> | ||
Line 40: | Line 37: | ||
<nowiki>#</nowiki>SBATCH --mem=<u>10gb</u><br> | <nowiki>#</nowiki>SBATCH --mem=<u>10gb</u><br> | ||
<nowiki>#</nowiki>SBATCH --time=<u>08:00:00</u><br> | <nowiki>#</nowiki>SBATCH --time=<u>08:00:00</u><br> | ||
<nowiki>#</nowiki>SBATCH --output= | <nowiki>#</nowiki>SBATCH --output=GLIMMER.%j.out<br> | ||
<nowiki>#</nowiki>SBATCH --error= | <nowiki>#</nowiki>SBATCH --error=GLIMMER.%j.err<br> | ||
cd $SLURM_SUBMIT_DIR<br> | cd $SLURM_SUBMIT_DIR<br> | ||
ml GETHOMOLOGUES/1.7.6 | ml GETHOMOLOGUES/1.7.6 | ||
perl get_homologues.pl | |||
perl /usr/local/apps/gb/GETHOMOLOGUES/1.7.6/get_homologues.pl [options] | |||
</div> | </div> | ||
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values. | In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values. | ||
Please refer to [[Running_Jobs_on_the_teaching_cluster]], [[Running_Jobs_on_the_teaching_cluster#Running_an_X-windows_application | Run X window Jobs]] and [[Running_Jobs_on_the_teaching_cluster#How_to_open_an_interactive_session | Run interactive Jobs]] for more details of running jobs at Teaching cluster. | Please refer to [[Running_Jobs_on_the_teaching_cluster]], [[Running_Jobs_on_the_teaching_cluster#Running_an_X-windows_application | Run X window Jobs]] and [[Running_Jobs_on_the_teaching_cluster#How_to_open_an_interactive_session | Run interactive Jobs]] for more details of running jobs at Teaching cluster. | ||
Here is an example of job submission command: | Here is an example of job submission command: | ||
Line 61: | Line 59: | ||
<pre class="gcommand"> | <pre class="gcommand"> | ||
ml GETHOMOLOGUES/1.7.6 | ml GETHOMOLOGUES/1.7.6 | ||
perl get_homologues.pl | perl /usr/local/apps/gb/GETHOMOLOGUES/1.7.6/get_homologues.pl -h | ||
usage: get_homologues.pl [options] | |||
usage: /usr/local/apps/gb/GETHOMOLOGUES/1.7.6/get_homologues.pl [options] | |||
-h this message | -h this message | ||
Line 132: | Line 131: | ||
previous results when run with the same input directory. | previous results when run with the same input directory. | ||
</pre> | </pre> | ||
[[#top|Back to Top]] | [[#top|Back to Top]] | ||
=== Installation === | === Installation === | ||
Source code is obtained from [https://github.com/eead-csic-compbio/get_homologues/releases GET_HOMOLOGUES] | Source code is obtained from [https://github.com/eead-csic-compbio/get_homologues/releases GET_HOMOLOGUES] | ||
=== System === | === System === | ||
64-bit Linux | 64-bit Linux |
Latest revision as of 11:04, 30 August 2019
Category
Bioinformatics
Program On
Teaching
Version
1.7.6
Author / Distributor
Description
"a versatile software package for pan-genome analysis.". More details are at GET_HOMOLOGUES
Running Program
- Version 1.7.6, installed at /usr/local/apps/gb/GETHOMOLOGUES/1.7.6
To use this version, please load the module with
ml GETHOMOLOGUES/1.7.6
ere is an example of a shell script, sub.sh, to run on the batch queue:
#!/bin/bash
#SBATCH --job-name=j_GLIMMER
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=08:00:00
#SBATCH --output=GLIMMER.%j.out
#SBATCH --error=GLIMMER.%j.err
cd $SLURM_SUBMIT_DIR
ml GETHOMOLOGUES/1.7.6
perl /usr/local/apps/gb/GETHOMOLOGUES/1.7.6/get_homologues.pl [options]
In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.
Please refer to Running_Jobs_on_the_teaching_cluster, Run X window Jobs and Run interactive Jobs for more details of running jobs at Teaching cluster.
Here is an example of job submission command:
sbatch ./sub.sh
Documentation
ml GETHOMOLOGUES/1.7.6 perl /usr/local/apps/gb/GETHOMOLOGUES/1.7.6/get_homologues.pl -h usage: /usr/local/apps/gb/GETHOMOLOGUES/1.7.6/get_homologues.pl [options] -h this message -v print version, credits and checks installation -d directory with input FASTA files ( .faa / .fna ), (overrides -i, GenBank files ( .gbk ), 1 per genome, or a subdirectory use of pre-clustered sequences ( subdir.clusters / subdir_ ) with pre-clustered sequences ignores -c, -g) ( .faa / .fna ); allows for new files to be added later; creates output folder named 'directory_homologues' -i input amino acid FASTA file with [taxon names] in headers, (required unless -d is set) creates output folder named 'file_homologues' Optional parameters: -o only run BLAST/Pfam searches and exit (useful to pre-compute searches) -c report genome composition analysis (follows order in -I file if enforced, ignores -r,-t,-e) -R set random seed for genome composition analysis (optional, requires -c, example -R 1234, required for mixing -c with -c -a runs) -s save memory by using BerkeleyDB; default parsing stores sequence hits in RAM -m runmode [local|cluster] (default local) -n nb of threads for BLAST/HMMER/MCL in 'local' runmode (default=2) -I file with .faa/.gbk files in -d to be included (takes all by default, requires -d) Algorithms instead of default bidirectional best-hits (BDBH): -G use COGtriangle algorithm (COGS, PubMed=20439257) (requires 3+ genomes|taxa) -M use orthoMCL algorithm (OMCL, PubMed=12952885) Options that control sequence similarity searches: -X use diamond instead of blastp (optional, set threads with -n) -C min %coverage in BLAST pairwise alignments (range [1-100],default=75) -E max E-value (default=1e-05,max=0.01) -D require equal Pfam domain composition (best with -m cluster or -n threads) when defining similarity-based orthology -S min %sequence identity in BLAST query/subj pairs (range [1-100],default=1 [BDBH|OMCL]) -N min BLAST neighborhood correlation PubMed=18475320 (range [0,1],default=0 [BDBH|OMCL]) -b compile core-genome with minimum BLAST searches (ignores -c [BDBH]) Options that control clustering: -t report sequence clusters including at least t taxa (default t=numberOfTaxa, t=0 reports all clusters [OMCL|COGS]) -a report clusters of sequence features in GenBank files (requires -d and .gbk files, instead of default 'CDS' GenBank features example -a 'tRNA,rRNA', NOTE: uses blastn instead of blastp, ignores -g,-D) -g report clusters of intergenic sequences flanked by ORFs (requires -d and .gbk files) in addition to default 'CDS' clusters -f filter by %length difference within clusters (range [1-100], by default sequence length is not checked) -r reference proteome .faa/.gbk file (by default takes file with least sequences; with BDBH sets first taxa to start adding genes) -e exclude clusters with inparalogues (by default inparalogues are included) -x allow sequences in multiple COG clusters (by default sequences are allocated to single clusters [COGS]) -F orthoMCL inflation value (range [1-5], default=1.5 [OMCL]) -A calculate average identity of clustered sequences, (optional, creates tab-separated matrix, by default uses blastp results but can use blastn with -a recommended with -t 0 [OMCL|COGS]) -P calculate percentage of conserved proteins (POCP), (optional, creates tab-separated matrix, by default uses blastp results but can use blastn with -a recommended with -t 0 [OMCL|COGS]) -z add soft-core to genome composition analysis (optional, requires -c [OMCL|COGS]) This program uses BLAST (and optionally HMMER/Pfam) to define clusters of 'orthologous' genomic sequences and pan/core-genome gene sets. Several algorithms are available and search parameters are customizable. It is designed to process (in a SGE computer cluster) files contained in a directory (-d), so that new .faa/.gbk files can be added while conserving previous BLAST results. In general the program will try to re-use previous results when run with the same input directory.
Installation
Source code is obtained from GET_HOMOLOGUES
System
64-bit Linux