BUSCO-Sapelo2

From Research Computing Center Wiki
Revision as of 13:52, 22 October 2020 by Moses (talk | contribs) (Created page with "Category:Sap2testCategory:SoftwareCategory:Bioinformatics === Category === Bioinformatics === Program On === Sap2test === Version === 4.0.5, 4.0.6 === Author...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Category

Bioinformatics

Program On

Sap2test

Version

4.0.5, 4.0.6

Author / Distributor

BUSCO

Description

" BUSCO - Benchmarking sets of Universal Single-Copy Orthologs." More details are at BUSCO

Running Program

Version 3.0.2

  • Version 3.0.2, is at /usr/local/apps/gb/busco/3.0.2

Blast+ v2.2.31 is loaded with this application. This version of Blast+ enable the multiple cores function for busco.

To use this version, please load the module with

ml busco/3.0.2 

Before run the program, copy the config files and change the input file value and other needed values at config file config_augustus/config.ini

cp -r /usr/local/apps/eb/AUGUSTUS/3.2.3-foss-2016b-Python-2.7.14/config config_augustus
export AUGUSTUS_CONFIG_PATH=config_augustus
cp /usr/local/apps/gb/busco/3.0.2/config/config.ini config.ini
vi config.ini
export BUSCO_CONFIG_FILE=config.ini

Here is an example of a shell script, sub.sh, to run on the batch queue:

#PBS -S /bin/bash
#PBS -N j_BUSCO
#PBS -q batch
#PBS -l nodes=1:ppn=1:AMD
#PBS -l walltime=8:00:00
#PBS -l mem=10gb
#PBS -M username@uga.edu
#PBS -m abe

cd $PBS_O_WORKDIR
ml busco/3.0.2
python /usr/local/apps/gb/busco/3.0.2/scripts/run_BUSCO.py [options]

In the real submission script, at least all the above underlined values need to be reviewed or to be replaced by the proper values.


Using AUGUSTUS from Maker

If you have issues with AUGUSTUS not running properly, you can use the AUGUSTS program that is installed with Maker. To use this version of AUGUSTUS, first do

cp -r /usr/local/apps/gb/Maker/3.01.02-beta/exe/augustus/config config_augustus
export AUGUSTUS_CONFIG_PATH=config_augustus
cp /usr/local/apps/gb/busco/3.0.2/config/config.ini.maker config.ini
vi config.ini
export BUSCO_CONFIG_FILE=config.ini

Here is a sample job submission script to use the version of AUGUSTUS that is installed in Maker:

#PBS -S /bin/bash
#PBS -N j_BUSCO
#PBS -q batch
#PBS -l nodes=1:ppn=1:AMD
#PBS -l walltime=8:00:00
#PBS -l mem=10gb
#PBS -M username@uga.edu
#PBS -m abe

cd $PBS_O_WORKDIR

ml busco/3.0.2
export AUGUSTUS_CONFIG_PATH=config_augustus
export BUSCO_CONFIG_FILE=config.ini
export PATH=/usr/local/apps/gb/Maker/3.01.02-beta/exe/augustus/bin:$PATH

python /usr/local/apps/gb/busco/3.0.2/scripts/run_BUSCO.py [options]


Please refer to Running Jobs on Sapelo2, Run X window Jobs and Run interactive Jobs for more details about running jobs at Sapelo2.

Version 4.0.5

  • Version 4.0.5, is installed as a singularity image at /usr/local/singularity-images/busco-4.0.5.simg

To run BUSCO v4.0.5 included in this singularity image:

singularity exec /usr/local/singularity-images/busco-4.0.5.simg run_busco [options]

To get busco help info:

singularity exec /usr/local/singularity-images/busco-4.0.5.simg run_busco -h

To check busco version info:

singularity exec /usr/local/singularity-images/busco-4.0.5.simg run_busco -v

To check other programs included in this singularity image:

singularity exec /usr/local/singularity-images/busco-4.0.5.simg ls /usr/local/bin
singularity exec /usr/local/singularity-images/busco-4.0.5.simg ls /augustus
singularity exec /usr/local/singularity-images/busco-4.0.5.simg ls /ncbi-blast-2.2.31+/bin
singularity exec /usr/local/singularity-images/busco-4.0.5.simg ls /hmmer-3.2.1


Version 4.0.6

  • Version 4.0.6, is installed in /usr/local/apps/eb/BUSCO/4.0.6-foss-2019b-Python-3.7.4

To use this version of busco, please first load the module with

module load BUSCO/4.0.6-foss-2019b-Python-3.7.4

This module will load other modules that this version of busco depends on.


Sample job submission script (sub.sh) to run busco version 4.0.5:

#PBS -S /bin/bash
#PBS -q batch
#PBS -N jobname
#PBS -l nodes=1:ppn=1
#PBS -l walltime=24:00:00
#PBS -l mem=10gb

cd $PBS_O_WORKDIR

singularity exec /usr/local/singularity-images/busco-4.0.5.simg run_busco [options]

where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.


Here is an example of job submission command:

qsub  ./sub.sh 

Documentation

ml busco/3.0.2   
python /usr/local/apps/gb/busco/3.0.2/scripts/run_BUSCO.py  -h
usage: python BUSCO.py -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS]

Welcome to BUSCO 3.0.2: the Benchmarking Universal Single-Copy Ortholog assessment tool.
For more detailed usage information, please review the README file provided with this distribution and the BUSCO user guide.

optional arguments:
  -i FASTA FILE, --in FASTA FILE
                        Input sequence file in FASTA format. Can be an assembled genome or transcriptome (DNA), or protein sequences from an annotated gene set.
  -c N, --cpu N         Specify the number (N=integer) of threads/cores to use.
  -o OUTPUT, --out OUTPUT
                        Give your analysis run a recognisable short name. Output folders and files will be labelled with this name. WARNING: do not provide a path
  -e N, --evalue N      E-value cutoff for BLAST searches. Allowed formats, 0.001 or 1e-03 (Default: 1e-03)
  -m MODE, --mode MODE  Specify which BUSCO analysis mode to run.
                        There are three valid modes:
                        - geno or genome, for genome assemblies (DNA)
                        - tran or transcriptome, for transcriptome assemblies (DNA)
                        - prot or proteins, for annotated gene sets (protein)
  -l LINEAGE, --lineage_path LINEAGE
                        Specify location of the BUSCO lineage data to be used.
                        Visit http://busco.ezlab.org for available lineages.
  -f, --force           Force rewriting of existing files. Must be used when output files with the provided name already exist.
  -r, --restart         Restart an uncompleted run. Not available for the protein mode
  -sp SPECIES, --species SPECIES
                        Name of existing Augustus species gene finding parameters. See Augustus documentation for available options.
  --augustus_parameters AUGUSTUS_PARAMETERS
                        Additional parameters for the fine-tuning of Augustus run. For the species, do not use this option.
                        Use single quotes as follow: '--param1=1 --param2=2', see Augustus documentation for available options.
  -t PATH, --tmp_path PATH
                        Where to store temporary files (Default: ./tmp/)
  --limit REGION_LIMIT  How many candidate regions (contig or transcript) to consider per BUSCO (default: 3)
  --long                Optimization mode Augustus self-training (Default: Off) adds considerably to the run time, but can improve results for some non-model organisms
  -q, --quiet           Disable the info logs, displays only errors
  -z, --tarzip          Tarzip the output folders likely to contain thousands of files
  --blast_single_core   Force tblastn to run on a single core and ignore the --cpu argument for this step only. Useful if inconsistencies when using multiple threads are noticed
  -v, --version         Show this version and exit
  -h, --help            Show this help message and exit

Back to Top

Installation

Source code is obtained from BUSCO

System

64-bit Linux