RAxML-NG-Sapelo2: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
(Created page with "Category:Sapelo2Category:SoftwareCategory:Bioinformatics === Category === Bioinformatics === Program On === Sapelo2 === Version === 1.2.2 ===Author / Distributor=== Alexey M. Kozlov and Alexandros Stamatakis ===Description=== RAxML-NG is a phylogenetic tree inference tool which uses maximum-likelihood (ML) optimality criterion. Its search heuristic is based on iteratively performing a series of Subtree Pruning and Regrafting (SPR) moves, which allows to qu...")
 
No edit summary
 
Line 46: Line 46:


cd $SLURM_SUBMIT_DIR<br>
cd $SLURM_SUBMIT_DIR<br>
ml RAxML-NG/1.2.2-GCC-12.2.0<br>   \
ml RAxML-NG/1.2.2-GCC-12.2.0<br>
raxml-ng raxml-ng --all --msa testAA.fa --model LG+G8+F --tree pars{10} --bs-trees 200 --threads 8<br>   
raxml-ng --all --msa testAA.fa --model LG+G8+F --tree pars{10} --bs-trees 200 --threads 8<br>   
</div>
</div>



Latest revision as of 10:34, 3 July 2024

Category

Bioinformatics

Program On

Sapelo2

Version

1.2.2

Author / Distributor

Alexey M. Kozlov and Alexandros Stamatakis

Description

RAxML-NG is a phylogenetic tree inference tool which uses maximum-likelihood (ML) optimality criterion. Its search heuristic is based on iteratively performing a series of Subtree Pruning and Regrafting (SPR) moves, which allows to quickly navigate to the best-known ML tree.

Running Program

Also refer to Running Jobs on Sapelo2

For more information on Environment Modules on Sapelo2 please see the Lmod page.


RAxML-NG

To use RaxML-NG, please first load the module with

module load RAxML-NG/1.2.2-GCC-12.2.0

Keep in mind that because this version doesn't use MPI, the job cannot be run on multiple nodes, thought it can use multiple threads on a single node.


Example of shell script to run the MPI version in the batch queue, using 32 MPI processes:

#!/bin/bash
#SBATCH --job-name=j_RAxML-NG
#SBATCH --partition=batch
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@uga.edu
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8 #this needs to be the same as the '--threads' option below!
#SBATCH --mem=64GB
#SBATCH --time=08:00:00
#SBATCH --output=RAxML-NG.%j.out
#SBATCH --error=RAxML-NG.%j.err


cd $SLURM_SUBMIT_DIR
ml RAxML-NG/1.2.2-GCC-12.2.0
raxml-ng --all --msa testAA.fa --model LG+G8+F --tree pars{10} --bs-trees 200 --threads 8


In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.

Please refer to Running Jobs on Sapelo2.

Documentation


ml RAxML-NG/1.2.2-GCC-12.2.0
raxml-ng  --help

RAxML-NG v. 1.2.2-master released on 30.04.2024 by The Exelixis Lab.
Developed by: Alexey M. Kozlov and Alexandros Stamatakis.
Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth, Julia Haag, Anastasis Togkousidis.
Latest version: https://github.com/amkozlov/raxml-ng
Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml

Usage: raxml-ng [OPTIONS]

Commands (mutually exclusive):
  --help                                     display help information
  --version                                  display version information
  --evaluate                                 evaluate the likelihood of a tree (with model+brlen optimization)
  --search                                   ML tree search (default: 10 parsimony + 10 random starting trees)
  --bootstrap                                bootstrapping (default: use bootstopping to auto-detect #replicates)
  --all                                      all-in-one (ML search + bootstrapping)
  --support                                  compute bipartition support for a given reference tree (e.g., best ML tree)
                                             and a set of replicate trees (e.g., from a bootstrap analysis)
  --bsconverge                               test for bootstrapping convergence using autoMRE criterion
  --bsmsa                                    generate bootstrap replicate MSAs
  --terrace                                  check whether a tree lies on a phylogenetic terrace 
  --check                                    check alignment correctness and remove empty columns/rows
  --parse                                    parse alignment, compress patterns and create binary MSA file
  --start                                    generate parsimony/random starting trees and exit
  --rfdist                                   compute pair-wise Robinson-Foulds (RF) distances between trees
  --consense [ STRICT | MR | MR<n> | MRE ]   build strict, majority-rule (MR) or extended MR (MRE) consensus tree (default: MR)
                                             eg: --consense MR75 --tree bsrep.nw
  --ancestral                                ancestral state reconstruction at all inner nodes
  --sitelh                                   print per-site log-likelihood values

Command shortcuts (mutually exclusive):
  --search1                                  Alias for: --search --tree rand{1}
  --loglh                                    Alias for: --evaluate --opt-model off --opt-branches off --nofiles --log result
  --rf                                       Alias for: --rfdist --nofiles --log result

Input and output options:
  --tree            rand{N} | pars{N} | FILE starting tree: rand(om), pars(imony) or user-specified (newick file)
                                             N = number of trees (default: rand{10},pars{10})
  --msa             FILE                     alignment file
  --msa-format      VALUE                    alignment file format: FASTA, PHYLIP, CATG or AUTO-detect (default)
  --data-type       VALUE                    data type: DNA, AA, BIN(ary) or AUTO-detect (default)
  --tree-constraint FILE                     constraint tree
  --prefix          STRING                   prefix for output files (default: MSA file name)
  --log             VALUE                    log verbosity: ERROR,WARNING,RESULT,INFO,PROGRESS,DEBUG (default: PROGRESS)
  --redo                                     overwrite existing result files and ignore checkpoints (default: OFF)
  --nofiles                                  do not create any output files, print results to the terminal only
  --precision       VALUE                    number of decimal places to print (default: 6)
  --outgroup        o1,o2,..,oN              comma-separated list of outgroup taxon names (it's just a drawing option!)
  --site-weights    FILE                     file with MSA column weights (positive integers only!)  

General options:
  --seed         VALUE                       seed for pseudo-random number generator (default: current time)
  --pat-comp     on | off                    alignment pattern compression (default: ON)
  --tip-inner    on | off                    tip-inner case optimization (default: OFF)
  --site-repeats on | off                    use site repeats optimization, 10%-60% faster than tip-inner (default: ON)
  --threads      VALUE                       number of parallel threads to use (default: 1)
  --workers      VALUE                       number of tree searches to run in parallel (default: 1)
  --simd         none | sse3 | avx | avx2    vector instruction set to use (default: auto-detect).
  --rate-scalers on | off                    use individual CLV scalers for each rate category (default: ON for >2000 taxa)
  --force        [ <CHECKS> ]                disable safety checks (please think twice!)

Model options:
  --model        <name>+G[n]+<Freqs> | FILE  model specification OR partition file
  --brlen        linked | scaled | unlinked  branch length linkage between partitions (default: scaled)
  --blmin        VALUE                       minimum branch length (default: 1e-6)
  --blmax        VALUE                       maximum branch length (default: 100)
  --blopt        nr_fast    | nr_safe        branch length optimization method (default: nr_fast)
                 nr_oldfast | nr_oldsafe     
  --opt-model    on | off                    ML optimization of all model parameters (default: ON)
  --opt-branches on | off                    ML optimization of all branch lengths (default: ON)
  --prob-msa     on | off                    use probabilistic alignment (works with CATG and VCF)
  --lh-epsilon   VALUE                       log-likelihood epsilon for optimization/tree search (default: 0.1)

Topology search options:
  --spr-radius           VALUE               SPR re-insertion radius for fast iterations (default: AUTO)
  --spr-cutoff           VALUE | off         relative LH cutoff for descending into subtrees (default: 1.0)
  --lh-epsilon-triplet   VALUE               log-likelihood epsilon for branch length triplet optimization (default: 1000)

Bootstrapping options:
  --bs-trees     VALUE                       number of bootstraps replicates
  --bs-trees     autoMRE{N}                  use MRE-based bootstrap convergence criterion, up to N replicates (default: 1000)
  --bs-trees     FILE                        Newick file containing set of bootstrap replicate trees (with --support)
  --bs-cutoff    VALUE                       cutoff threshold for the MRE-based bootstopping criteria (default: 0.03)
  --bs-metric    fbp | tbe                   branch support metric: fbp = Felsenstein bootstrap (default), tbe = transfer distance
  --bs-write-msa on | off                    write all bootstrap alignments (default: OFF)

EXAMPLES:
  1. Perform tree inference on DNA alignment 
     (10 random + 10 parsimony starting trees, general time-reversible model, ML estimate of substitution rates and
      nucleotide frequencies, discrete GAMMA model of rate heterogeneity with 4 categories):

     ./raxml-ng --msa testDNA.fa --model GTR+G


  2. Perform an all-in-one analysis (ML tree search + non-parametric bootstrap) 
     (10 randomized parsimony starting trees, fixed empirical substitution matrix (LG),
      empirical aminoacid frequencies from alignment, 8 discrete GAMMA categories,
      200 bootstrap replicates):

     ./raxml-ng --all --msa testAA.fa --model LG+G8+F --tree pars{10} --bs-trees 200


  3. Optimize branch lengths and free model parameters on a fixed topology
     (using multiple partitions with proportional branch lengths)

     ./raxml-ng --evaluate --msa testAA.fa --model partitions.txt --tree test.tree --brlen scaled


Installation

Source code for RAxML 8.2.12 downloaded from githhub and compiled with intel-2019b compilers and intel MPI.

System

64-bit Linux