OrthoFinder-Sapelo2

From Research Computing Center Wiki
Jump to navigation Jump to search

Category

Bioinformatics

Program On

Sapelo2

Version

2.5.4, 2.5.5

Author / Distributor

OrthoFinder

Description

"OrthoFinder is a fast, accurate and comprehensive analysis tool for comparative genomics. It finds orthologues and orthogroups infers rooted gene trees for all orthogroups and infers a rooted species tree for the species being analysed. OrthoFinder also provides comprehensive statistics for comparative genomic analyses. OrthoFinder is simple to use and all you need to run it is a set of protein sequence files (one per species) in FASTA format." More details are at OrthoFinder

Running Program

Please refer to Running Jobs on Sapelo2


Version 2.5.4

Version 2.5.4, installed at

  • /apps/eb/OrthoFinder/2.5.4-foss-2022a/

To use version 2.5.4, please first load the module with

ml OrthoFinder/2.5.4-foss-2022a


Version 2.5.5

Version 2.5.5, installed at

  • /apps/eb/OrthoFinder/2.5.5-foss-2022a/

To use version 2.5.5, please first load the module with

ml OrthoFinder/2.5.5-foss-2022a


Note that if the option

-M msa

is used as shown in the documentation below, this will allow for both multiple sequence alignment building as well as tree inference. The default methods for these processes (MAFFT and FastTree, respectively) are loaded with OrthoFinder/2.5.5-foss-2022a, but if you wish to use alternative methods as described in the documentation, you will need to load the modules for these methods yourself. Please refer to Available Toolchains and Toolchain Compatibility if you are unsure of what installations of these software will be compatible with our installation of OrthoFinder.

Here is an example of a shell script sub.sh to run OrthoFinder v2.5.5 at the batch queue:

#!/bin/bash
#SBATCH --job-name=j_OrthoFinder   
#SBATCH --partition=batch            
#SBATCH --ntasks=1                  	
#SBATCH --cpus-per-task=8       
#SBATCH --mem=32gb                    
#SBATCH --time=120:00:00           
#SBATCH --output=log.%j.out     
#SBATCH --error=log.%j.err          
#SBATCH --mail-user=username@uga.edu  
#SBATCH --mail-type=ALL   

cd $SLURM_SUBMIT_DIR

ml OrthoFinder/2.5.5-foss-2022a

orthofinder -t 8 -a 8 [options]   


Here is an example of job submission

sbatch ./sub.sh 

Documentation

ml OrthoFinder/2.5.5-foss-2022a 
orthofinder -h

OrthoFinder version 2.5.5 Copyright (C) 2014 David Emms

SIMPLE USAGE:
Run full OrthoFinder analysis on FASTA format proteomes in <dir>
  orthofinder [options] -f <dir>

Add new species in <dir1> to previous run in <dir2> and run new analysis
  orthofinder [options] -f <dir1> -b <dir2>

OPTIONS:
 -t <int>        Number of parallel sequence search threads [Default = 16]
 -a <int>        Number of parallel analysis threads
 -d              Input is DNA sequences
 -M <txt>        Method for gene tree inference. Options 'dendroblast' & 'msa'
                 [Default = dendroblast]
 -S <txt>        Sequence search program [Default = diamond]
                 Options: blast, diamond, diamond_ultra_sens, blast_gz, mmseqs, blast_nucl
 -A <txt>        MSA program, requires '-M msa' [Default = mafft]
                 Options: mafft, muscle
 -T <txt>        Tree inference method, requires '-M msa' [Default = fasttree]
                 Options: fasttree, raxml, raxml-ng, iqtree
 -s <file>       User-specified rooted species tree
 -I <int>        MCL inflation parameter [Default = 1.5]
 --fewer-files   Only create one orthologs file per species
 -x <file>       Info for outputting results in OrthoXML format
 -p <dir>        Write the temporary pickle files to <dir>
 -1              Only perform one-way sequence search
 -X              Don't add species names to sequence IDs
 -y              Split paralogous clades below root of a HOG into separate HOGs
 -z              Don't trim MSAs (columns>=90% gap, min. alignment length 500)
 -n <txt>        Name to append to the results directory
 -o <txt>        Non-default results directory
 -h              Print this help text

WORKFLOW STOPPING OPTIONS:
 -op             Stop after preparing input files for BLAST
 -og             Stop after inferring orthogroups
 -os             Stop after writing sequence files for orthogroups
                 (requires '-M msa')
 -oa             Stop after inferring alignments for orthogroups
                 (requires '-M msa')
 -ot             Stop after inferring gene trees for orthogroups 

WORKFLOW RESTART COMMANDS:
 -b  <dir>         Start OrthoFinder from pre-computed BLAST results in <dir>
 -fg <dir>         Start OrthoFinder from pre-computed orthogroups in <dir>
 -ft <dir>         Start OrthoFinder from pre-computed gene trees in <dir>

LICENSE:
 Distributed under the GNU General Public License (GPLv3). See License.md

CITATION:
 When publishing work that uses OrthoFinder please cite:
 Emms D.M. & Kelly S. (2019), Genome Biology 20:238

 If you use the species tree in your work then please also cite:
 Emms D.M. & Kelly S. (2017), MBE 34(12): 3267-3278
 Emms D.M. & Kelly S. (2018), bioRxiv https://doi.org/10.1101/267914

Back to Top

Installation

source code from OrthoFinder

System

64-bit Linux