BUSCO-Sapelo2: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 27: Line 27:
BLAST+ v2.9.0 is loaded with this application. This version of Blast+ enables the multiple cores function for busco. AUGUSTUS v3.3.3 is also loaded with AUGUSTUS_CONFIG_PATH set correctly.  
BLAST+ v2.9.0 is loaded with this application. This version of Blast+ enables the multiple cores function for busco. AUGUSTUS v3.3.3 is also loaded with AUGUSTUS_CONFIG_PATH set correctly.  


To use this version, please load the module with
To use this version of BUSCO, please load the module with
<pre class="gscript">
<pre class="gscript">
ml BUSCO/4.0.5-foss-2019b-Python-3.7.4
ml BUSCO/4.0.5-foss-2019b-Python-3.7.4
Line 40: Line 40:
</pre>
</pre>


Here is an example of a shell script, sub.sh, to run on the batch queue:  
==== Version 4.0.6 ====
 
* Version 4.0.6, is installed at /apps/eb/BUSCO/4.0.6-foss-2019b-Python-3.7.4
 
To use this version of BUSCO, please first load the module with
<pre class="gcommand">
ml BUSCO/4.0.6-foss-2019b-Python-3.7.4
</pre>
 
BLAST+ v2.9.0 is loaded with this application. This version of Blast+ enables the multiple cores function for busco. AUGUSTUS v3.3.3 is also loaded with AUGUSTUS_CONFIG_PATH set correctly.
 
Before run the program, please copy the config file config.ini to your current working folder and modify the input file value and other values as needed in it
 
<pre class="gscript">
cp /apps/eb/BUSCO/4.0.5-foss-2019b-Python-3.7.4/config/config.ini config.ini
vim config.ini
export BUSCO_CONFIG_FILE=config.ini
</pre>
 
 
Here is an example of a shell script, sub.sh, to run BUSCO/4.0.5 on the batch queue:  
<pre class="gscript">
<pre class="gscript">
#!/bin/bash
#!/bin/bash
Line 62: Line 82:
</pre>
</pre>


where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the time limit, maximum memory, number of cores, and the job name need to be modified appropriately as well.




 
==== Version 4.0.5 Singularity Container ====
 
 
==== Version 4.0.5 ====


* Version 4.0.5, is installed as a singularity image at /usr/local/singularity-images/busco-4.0.5.simg  
* Version 4.0.5, is installed as a singularity image at /usr/local/singularity-images/busco-4.0.5.simg  
Line 96: Line 114:




==== Version 4.0.6 ====
Sample job submission script (sub.sh) to run BUSCO/4.0.5 singularity container:


* Version 4.0.6, is installed in /usr/local/apps/eb/BUSCO/4.0.6-foss-2019b-Python-3.7.4
<pre class="gscript">
#!/bin/bash
#SBATCH --job-name=busco              # Job name
#SBATCH --partition=batch            # Partition (queue) name
#SBATCH --ntasks=1                    # Run a single task
#SBATCH --cpus-per-task=4             # Number of CPU cores per task
#SBATCH --mem=10gb                    # Job memory request
#SBATCH --time=48:00:00              # Time limit hrs:min:sec
#SBATCH --output=log.%j.out          # Standard output log
#SBATCH --error=log.%j.err            # Standard error log


To use this version of busco, please first load the module with
#SBATCH --mail-type=END,FAIL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
<pre class="gcommand">
#SBATCH --mail-user=username@uga.edu  # Where to send mail
module load BUSCO/4.0.6-foss-2019b-Python-3.7.4
</pre>
This module will load other modules that this version of busco depends on.


cd $SLURM_SUBMIT_DIR


Sample job submission script (sub.sh) to run busco version 4.0.5:
singularity exec /apps/singularity-images/busco-4.0.5.simg  
 
<pre class="gscript">
#PBS -S /bin/bash
#PBS -q batch
#PBS -N jobname
#PBS -l nodes=1:ppn=1
#PBS -l walltime=24:00:00
#PBS -l mem=10gb
 
cd $PBS_O_WORKDIR
 
singularity exec /usr/local/singularity-images/busco-4.0.5.simg run_busco [options]
</pre>
</pre>


Line 127: Line 140:
Here is an example of job submission command:
Here is an example of job submission command:
<pre  class="gcommand">
<pre  class="gcommand">
qsub  ./sub.sh  
sbatch sub.sh  
</pre>
</pre>


Line 133: Line 146:
   
   
<pre  class="gcommand">
<pre  class="gcommand">
ml busco/3.0.
ml BUSCO/4.0.6-foss-2019b-Python-3.7.4
python /usr/local/apps/gb/busco/3.0.2/scripts/run_BUSCO.py  -h
busco -h
usage: python BUSCO.py -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS]


Welcome to BUSCO 3.0.2: the Benchmarking Universal Single-Copy Ortholog assessment tool.
usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS]
 
Welcome to BUSCO 4.0.6: the Benchmarking Universal Single-Copy Ortholog assessment tool.
For more detailed usage information, please review the README file provided with this distribution and the BUSCO user guide.
For more detailed usage information, please review the README file provided with this distribution and the BUSCO user guide.


Line 146: Line 160:
   -o OUTPUT, --out OUTPUT
   -o OUTPUT, --out OUTPUT
                         Give your analysis run a recognisable short name. Output folders and files will be labelled with this name. WARNING: do not provide a path
                         Give your analysis run a recognisable short name. Output folders and files will be labelled with this name. WARNING: do not provide a path
  --out_path OUTPUT_PATH
                        Optional location for results folder, excluding results folder name. Default is current working directory.
   -e N, --evalue N      E-value cutoff for BLAST searches. Allowed formats, 0.001 or 1e-03 (Default: 1e-03)
   -e N, --evalue N      E-value cutoff for BLAST searches. Allowed formats, 0.001 or 1e-03 (Default: 1e-03)
   -m MODE, --mode MODE  Specify which BUSCO analysis mode to run.
   -m MODE, --mode MODE  Specify which BUSCO analysis mode to run.
Line 152: Line 168:
                         - tran or transcriptome, for transcriptome assemblies (DNA)
                         - tran or transcriptome, for transcriptome assemblies (DNA)
                         - prot or proteins, for annotated gene sets (protein)
                         - prot or proteins, for annotated gene sets (protein)
   -l LINEAGE, --lineage_path LINEAGE
   -l LINEAGE, --lineage_dataset LINEAGE
                         Specify location of the BUSCO lineage data to be used.
                         Specify the name of the BUSCO lineage to be used.
                        Visit http://busco.ezlab.org for available lineages.
   -f, --force          Force rewriting of existing files. Must be used when output files with the provided name already exist.
   -f, --force          Force rewriting of existing files. Must be used when output files with the provided name already exist.
  -r, --restart        Restart an uncompleted run. Not available for the protein mode
  -sp SPECIES, --species SPECIES
                        Name of existing Augustus species gene finding parameters. See Augustus documentation for available options.
  --augustus_parameters AUGUSTUS_PARAMETERS
                        Additional parameters for the fine-tuning of Augustus run. For the species, do not use this option.
                        Use single quotes as follow: '--param1=1 --param2=2', see Augustus documentation for available options.
  -t PATH, --tmp_path PATH
                        Where to store temporary files (Default: ./tmp/)
   --limit REGION_LIMIT  How many candidate regions (contig or transcript) to consider per BUSCO (default: 3)
   --limit REGION_LIMIT  How many candidate regions (contig or transcript) to consider per BUSCO (default: 3)
   --long                Optimization mode Augustus self-training (Default: Off) adds considerably to the run time, but can improve results for some non-model organisms
   --long                Optimization mode Augustus self-training (Default: Off) adds considerably to the run time, but can improve results for some non-model organisms
   -q, --quiet          Disable the info logs, displays only errors
   -q, --quiet          Disable the info logs, displays only errors
   -z, --tarzip          Tarzip the output folders likely to contain thousands of files
   --augustus_parameters AUGUSTUS_PARAMETERS
   --blast_single_core   Force tblastn to run on a single core and ignore the --cpu argument for this step only. Useful if inconsistencies when using multiple threads are noticed
                        Pass additional arguments to Augustus. All arguments should be contained within a single pair of quotation marks, separated by commas. E.g. '--param1=1,--param2=2'
  --augustus_species AUGUSTUS_SPECIES
                        Specify a species for Augustus training.
  --auto-lineage        Run auto-lineage to find optimum lineage path
   --auto-lineage-prok   Run auto-lineage just on non-eukaryote trees to find optimum lineage path
  --auto-lineage-euk    Run auto-placement just on eukaryote tree to find optimum lineage path
  --update-data        Download and replace with last versions all lineages datasets and files necessary to their automated selection
  --offline            To indicate that BUSCO cannot attempt to download files
  --config CONFIG_FILE  Provide a config file
   -v, --version        Show this version and exit
   -v, --version        Show this version and exit
   -h, --help            Show this help message and exit
   -h, --help            Show this help message and exit
  --list-datasets      Print the list of available BUSCO datasets
</pre>
</pre>
[[#top|Back to Top]]
[[#top|Back to Top]]

Revision as of 14:35, 22 October 2020

Category

Bioinformatics

Program On

Sap2test

Version

4.0.5, 4.0.6

Author / Distributor

BUSCO

Description

"BUSCO - Benchmarking sets of Universal Single-Copy Orthologs." More details are at BUSCO

Running Program

Version 4.0.5

  • Version 4.0.5, is installed at /apps/eb/BUSCO/4.0.5-foss-2019b-Python-3.7.4

BLAST+ v2.9.0 is loaded with this application. This version of Blast+ enables the multiple cores function for busco. AUGUSTUS v3.3.3 is also loaded with AUGUSTUS_CONFIG_PATH set correctly.

To use this version of BUSCO, please load the module with

ml BUSCO/4.0.5-foss-2019b-Python-3.7.4

Before run the program, please copy the config file config.ini to your current working folder and modify the input file value and other values as needed in it

cp /apps/eb/BUSCO/4.0.5-foss-2019b-Python-3.7.4/config/config.ini config.ini
vim config.ini
export BUSCO_CONFIG_FILE=config.ini

Version 4.0.6

  • Version 4.0.6, is installed at /apps/eb/BUSCO/4.0.6-foss-2019b-Python-3.7.4

To use this version of BUSCO, please first load the module with

ml BUSCO/4.0.6-foss-2019b-Python-3.7.4 

BLAST+ v2.9.0 is loaded with this application. This version of Blast+ enables the multiple cores function for busco. AUGUSTUS v3.3.3 is also loaded with AUGUSTUS_CONFIG_PATH set correctly.

Before run the program, please copy the config file config.ini to your current working folder and modify the input file value and other values as needed in it

cp /apps/eb/BUSCO/4.0.5-foss-2019b-Python-3.7.4/config/config.ini config.ini
vim config.ini
export BUSCO_CONFIG_FILE=config.ini


Here is an example of a shell script, sub.sh, to run BUSCO/4.0.5 on the batch queue:

#!/bin/bash
#SBATCH --job-name=busco              # Job name
#SBATCH --partition=batch             # Partition (queue) name
#SBATCH --ntasks=1                    # Run a single task	
#SBATCH --cpus-per-task=4             # Number of CPU cores per task
#SBATCH --mem=10gb                    # Job memory request
#SBATCH --time=48:00:00               # Time limit hrs:min:sec
#SBATCH --output=log.%j.out           # Standard output log
#SBATCH --error=log.%j.err            # Standard error log

#SBATCH --mail-type=END,FAIL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=username@uga.edu  # Where to send mail	

cd $SLURM_SUBMIT_DIR

ml BUSCO/4.0.5-foss-2019b-Python-3.7.4  # load BUSCO/4.0.5 module

time busco --config ./config.ini --cpu 4 [options]

where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the time limit, maximum memory, number of cores, and the job name need to be modified appropriately as well.


Version 4.0.5 Singularity Container

  • Version 4.0.5, is installed as a singularity image at /usr/local/singularity-images/busco-4.0.5.simg

To run BUSCO v4.0.5 included in this singularity image:

singularity exec /usr/local/singularity-images/busco-4.0.5.simg run_busco [options]

To get busco help info:

singularity exec /usr/local/singularity-images/busco-4.0.5.simg run_busco -h

To check busco version info:

singularity exec /usr/local/singularity-images/busco-4.0.5.simg run_busco -v

To check other programs included in this singularity image:

singularity exec /usr/local/singularity-images/busco-4.0.5.simg ls /usr/local/bin
singularity exec /usr/local/singularity-images/busco-4.0.5.simg ls /augustus
singularity exec /usr/local/singularity-images/busco-4.0.5.simg ls /ncbi-blast-2.2.31+/bin
singularity exec /usr/local/singularity-images/busco-4.0.5.simg ls /hmmer-3.2.1


Sample job submission script (sub.sh) to run BUSCO/4.0.5 singularity container:

#!/bin/bash
#SBATCH --job-name=busco              # Job name
#SBATCH --partition=batch             # Partition (queue) name
#SBATCH --ntasks=1                    # Run a single task	
#SBATCH --cpus-per-task=4             # Number of CPU cores per task
#SBATCH --mem=10gb                    # Job memory request
#SBATCH --time=48:00:00               # Time limit hrs:min:sec
#SBATCH --output=log.%j.out           # Standard output log
#SBATCH --error=log.%j.err            # Standard error log

#SBATCH --mail-type=END,FAIL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=username@uga.edu  # Where to send mail	

cd $SLURM_SUBMIT_DIR

singularity exec /apps/singularity-images/busco-4.0.5.simg 

where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.


Here is an example of job submission command:

sbatch sub.sh 

Documentation

ml BUSCO/4.0.6-foss-2019b-Python-3.7.4
busco -h

usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS]

Welcome to BUSCO 4.0.6: the Benchmarking Universal Single-Copy Ortholog assessment tool.
For more detailed usage information, please review the README file provided with this distribution and the BUSCO user guide.

optional arguments:
  -i FASTA FILE, --in FASTA FILE
                        Input sequence file in FASTA format. Can be an assembled genome or transcriptome (DNA), or protein sequences from an annotated gene set.
  -c N, --cpu N         Specify the number (N=integer) of threads/cores to use.
  -o OUTPUT, --out OUTPUT
                        Give your analysis run a recognisable short name. Output folders and files will be labelled with this name. WARNING: do not provide a path
  --out_path OUTPUT_PATH
                        Optional location for results folder, excluding results folder name. Default is current working directory.
  -e N, --evalue N      E-value cutoff for BLAST searches. Allowed formats, 0.001 or 1e-03 (Default: 1e-03)
  -m MODE, --mode MODE  Specify which BUSCO analysis mode to run.
                        There are three valid modes:
                        - geno or genome, for genome assemblies (DNA)
                        - tran or transcriptome, for transcriptome assemblies (DNA)
                        - prot or proteins, for annotated gene sets (protein)
  -l LINEAGE, --lineage_dataset LINEAGE
                        Specify the name of the BUSCO lineage to be used.
  -f, --force           Force rewriting of existing files. Must be used when output files with the provided name already exist.
  --limit REGION_LIMIT  How many candidate regions (contig or transcript) to consider per BUSCO (default: 3)
  --long                Optimization mode Augustus self-training (Default: Off) adds considerably to the run time, but can improve results for some non-model organisms
  -q, --quiet           Disable the info logs, displays only errors
  --augustus_parameters AUGUSTUS_PARAMETERS
                        Pass additional arguments to Augustus. All arguments should be contained within a single pair of quotation marks, separated by commas. E.g. '--param1=1,--param2=2'
  --augustus_species AUGUSTUS_SPECIES
                        Specify a species for Augustus training.
  --auto-lineage        Run auto-lineage to find optimum lineage path
  --auto-lineage-prok   Run auto-lineage just on non-eukaryote trees to find optimum lineage path
  --auto-lineage-euk    Run auto-placement just on eukaryote tree to find optimum lineage path
  --update-data         Download and replace with last versions all lineages datasets and files necessary to their automated selection
  --offline             To indicate that BUSCO cannot attempt to download files
  --config CONFIG_FILE  Provide a config file
  -v, --version         Show this version and exit
  -h, --help            Show this help message and exit
  --list-datasets       Print the list of available BUSCO datasets

Back to Top

Installation

Source code is obtained from BUSCO

System

64-bit Linux