BUSCO-Sapelo2: Difference between revisions
No edit summary |
|||
(10 intermediate revisions by 2 users not shown) | |||
Line 9: | Line 9: | ||
=== Version === | === Version === | ||
2.0, 4.0.5, 5.4.7, 5.5.0 | <!-- | ||
2.0, 4.0.5, 5.4.7, 5.5.0, 5.8.2 | |||
--> | |||
5.8.3 | |||
=== Author / Distributor === | === Author / Distributor === | ||
Line 21: | Line 24: | ||
=== Running Program === | === Running Program === | ||
<!-- | |||
==== Version 5.4.7 ==== | ==== Version 5.4.7 ==== | ||
Line 64: | Line 68: | ||
export BUSCO_CONFIG_FILE=config.ini | export BUSCO_CONFIG_FILE=config.ini | ||
</pre> | </pre> | ||
'''Example shell script''' sub.sh to run BUSCO/5.5.0 on the batch partition: | '''Example shell script''' sub.sh to run BUSCO/5.5.0 on the batch partition: | ||
Line 94: | Line 96: | ||
where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the time limit, maximum memory, number of cores, and the job name need to be modified appropriately as well. | where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the time limit, maximum memory, number of cores, and the job name need to be modified appropriately as well. | ||
==== Version 5.8.2, Singularity Image ==== | |||
* Version 5.8.2, is installed as a singularity image at /apps/singularity-images/busco-5.8.2.sif | |||
Before run busco singularity container, please copy AUGUSTUS config folder to your current working folder: | |||
<pre class="gscript"> | |||
apptainer exec /apps/singularity-images/busco-5.8.2.sif cp -r /usr/share/augustus/config/ ./config_augustus | |||
export AUGUSTUS_CONFIG_PATH=config_augustus | |||
</pre> | |||
Please also copy the BUSCO config file config.ini from its singularity image to your current working folder and modify the input file value and other values as needed in it. | |||
<pre class="gscript"> | |||
apptainer exec /apps/singularity-images/busco-5.8.2.sif cp /busco-5.8.2/config/config.ini . | |||
vim config.ini | |||
export BUSCO_CONFIG_FILE=config.ini | |||
</pre> | |||
'''Example shell script''' sub.sh to run BUSCO/5.8.2 singularity container: | |||
<pre class="gscript"> | |||
#!/bin/bash | |||
#SBATCH --job-name=busco # Job name | |||
#SBATCH --partition=batch # Partition (queue) name | |||
#SBATCH --ntasks=1 # Run a single task | |||
#SBATCH --cpus-per-task=4 # Number of CPU cores per task | |||
#SBATCH --mem=10gb # Job memory request | |||
#SBATCH --time=24:00:00 # Time limit hrs:min:sec | |||
#SBATCH --output=log.%j.out # Standard output log | |||
#SBATCH --error=log.%j.err # Standard error log | |||
#SBATCH --export=NONE # Don't export user's explicit env variables to compute node | |||
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) | |||
#SBATCH --mail-user=username@uga.edu # Where to send mail | |||
cd $SLURM_SUBMIT_DIR | |||
export AUGUSTUS_CONFIG_PATH=${PWD}/config_augustus | |||
export BUSCO_CONFIG_FILE=${PWD}/config.ini | |||
time apptainer exec --bind ./config_augustus:/usr/share/augustus/config /apps/singularity-images/busco-5.8.2.sif busco --config ./config.ini --cpu 4 [options] | |||
</pre> | |||
where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well. | |||
Here is an example of job submission command: | |||
<pre class="gcommand"> | |||
sbatch sub.sh | |||
</pre> | |||
==== | ==== ersion 4.0.5, Singularity Image ==== | ||
* version 2.0 is installed as a singularity image at /apps/singularity-images/busco-2.0.simg | * version 2.0 is installed as a singularity image at /apps/singularity-images/busco-2.0.simg | ||
Line 175: | Line 227: | ||
sbatch sub.sh | sbatch sub.sh | ||
</pre> | </pre> | ||
--> | |||
==== Version 5.8.3 ==== | |||
* Version 5.8.3, is installed at /apps/eb/BUSCO/5.8.3-foss-2023a | |||
To use this version of BUSCO, please first load the module with | |||
<pre class="gcommand"> | |||
ml BUSCO/5.8.3-foss-2023a | |||
</pre> | |||
BLAST+ v2.14.1 is loaded with this application. This version of Blast+ enables the multiple cores function for busco. AUGUSTUS v3.5.0 is also loaded with AUGUSTUS_CONFIG_PATH set correctly. | |||
Before run the program, please copy the BUSCO config file config.ini to your current working folder and modify the input file value and other values as needed in it. Please also copy AUGUSTUS config folder to the place: | |||
<pre class="gscript"> | |||
cp -r /apps/eb/AUGUSTUS/3.5.0-foss-2023a/config config_augustus | |||
export AUGUSTUS_CONFIG_PATH=config_augustus | |||
cp /apps/eb/BUSCO/5.8.3-foss-2023a/config/config.ini config.ini | |||
vim config.ini | |||
export BUSCO_CONFIG_FILE=config.ini | |||
</pre> | |||
'''Example shell script''' sub.sh to run BUSCO/5.8.3 on the batch partition: | |||
<pre class="gscript"> | |||
#!/bin/bash | |||
#SBATCH --job-name=busco # Job name | |||
#SBATCH --partition=batch # Partition (queue) name | |||
#SBATCH --ntasks=1 # Run a single task | |||
#SBATCH --cpus-per-task=4 # Number of CPU cores per task | |||
#SBATCH --mem=10gb # Job memory request | |||
#SBATCH --time=48:00:00 # Time limit hrs:min:sec | |||
#SBATCH --output=log.%j.out # Standard output log | |||
#SBATCH --error=log.%j.err # Standard error log | |||
#SBATCH --export=NONE # Don't export user's explicit env variables to compute node | |||
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) | |||
#SBATCH --mail-user=username@uga.edu # Where to send mail | |||
cd $SLURM_SUBMIT_DIR | |||
ml BUSCO/5.8.3-foss-2023a # load BUSCO v5.8.3 module | |||
export AUGUSTUS_CONFIG_PATH=${PWD}/config_augustus | |||
export BUSCO_CONFIG_FILE=${PWD}/config.ini | |||
time busco --config ./config.ini --cpu 4 [options] | |||
</pre> | |||
where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the time limit, maximum memory, number of cores, and the job name need to be modified appropriately as well. | |||
=== Documentation === | === Documentation === | ||
<pre class="gcommand"> | <pre class="gcommand"> | ||
ml BUSCO/5.8.3-foss-2023a | |||
busco -h | |||
usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS] | usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS] | ||
Welcome to BUSCO 5. | Welcome to BUSCO 5.8.3: the Benchmarking Universal Single-Copy Ortholog assessment tool. | ||
For more detailed usage information, please review the README file provided with this distribution and the BUSCO user guide. Visit this page https://gitlab.com/ezlab/busco#how-to-cite-busco to see how to cite BUSCO | For more detailed usage information, please review the README file provided with this distribution and the BUSCO user guide. Visit this page https://gitlab.com/ezlab/busco#how-to-cite-busco to see how to cite BUSCO | ||
Line 199: | Line 305: | ||
Specify the name of the BUSCO lineage to be used. | Specify the name of the BUSCO lineage to be used. | ||
--augustus Use augustus gene predictor for eukaryote runs | --augustus Use augustus gene predictor for eukaryote runs | ||
--augustus_parameters --PARAM1=VALUE1,--PARAM2=VALUE2 | --augustus_parameters "--PARAM1=VALUE1,--PARAM2=VALUE2" | ||
Pass additional arguments to Augustus. All arguments should be contained within a single string with no white space, with each argument separated by a comma. | Pass additional arguments to Augustus. All arguments should be contained within a single string with no white space, with each argument separated by a comma. | ||
--augustus_species AUGUSTUS_SPECIES | --augustus_species AUGUSTUS_SPECIES | ||
Line 210: | Line 316: | ||
--contig_break n Number of contiguous Ns to signify a break between contigs. Default is n=10. | --contig_break n Number of contiguous Ns to signify a break between contigs. Default is n=10. | ||
--datasets_version DATASETS_VERSION | --datasets_version DATASETS_VERSION | ||
Specify the version of BUSCO datasets, e.g. odb10 | Specify the version of BUSCO datasets, e.g. odb10, odb12 (default odb12) | ||
--download [dataset ...] | --download [dataset ...] | ||
Download dataset. Possible values are a specific dataset name, "all", "prokaryota", "eukaryota", or "virus". If used together with other command line arguments, make sure to place this last. | Download dataset. Possible values are a specific dataset name, "all", "prokaryota", "eukaryota", or "virus". If used together with other command line arguments, make sure to place this last. | ||
Line 223: | Line 329: | ||
--list-datasets Print the list of available BUSCO datasets | --list-datasets Print the list of available BUSCO datasets | ||
--long Optimization Augustus self-training mode (Default: Off); adds considerably to the run time, but can improve results for some non-model organisms | --long Optimization Augustus self-training mode (Default: Off); adds considerably to the run time, but can improve results for some non-model organisms | ||
--metaeuk Use Metaeuk gene predictor | |||
--metaeuk_parameters "--PARAM1=VALUE1,--PARAM2=VALUE2" | --metaeuk_parameters "--PARAM1=VALUE1,--PARAM2=VALUE2" | ||
Pass additional arguments to Metaeuk for the first run. All arguments should be contained within a single string with no white space, with each argument separated by a comma. | Pass additional arguments to Metaeuk for the first run. All arguments should be contained within a single string with no white space, with each argument separated by a comma. | ||
--metaeuk_rerun_parameters "--PARAM1=VALUE1,--PARAM2=VALUE2" | --metaeuk_rerun_parameters "--PARAM1=VALUE1,--PARAM2=VALUE2" | ||
Pass additional arguments to Metaeuk for the second run. All arguments should be contained within a single string with no white space, with each argument separated by a comma. | Pass additional arguments to Metaeuk for the second run. All arguments should be contained within a single string with no white space, with each argument separated by a comma. | ||
--miniprot Use | --miniprot Use Miniprot gene predictor | ||
--skip_bbtools Skip BBTools for assembly statistics | |||
--offline To indicate that BUSCO cannot attempt to download files | --offline To indicate that BUSCO cannot attempt to download files | ||
--opt-out-run-stats Opt out of data collection. Information on the data collected is available in the user guide. | |||
--out_path OUTPUT_PATH | --out_path OUTPUT_PATH | ||
Optional location for results folder, excluding results folder name. Default is current working directory. | Optional location for results folder, excluding results folder name. Default is current working directory. | ||
Line 236: | Line 345: | ||
Writes ACGTN content per scaffold to a file scaffold_composition.txt | Writes ACGTN content per scaffold to a file scaffold_composition.txt | ||
--tar Compress some subdirectories with many files to save space | --tar Compress some subdirectories with many files to save space | ||
-v, --version Show this version and exit | -v, --version Show this version and exit | ||
</pre> | </pre> |
Latest revision as of 10:50, 8 July 2025
Category
Bioinformatics
Program On
Sapelo2
Version
5.8.3
Author / Distributor
Description
"BUSCO - Benchmarking sets of Universal Single-Copy Orthologs." More details are at BUSCO
Running Program
Version 5.8.3
- Version 5.8.3, is installed at /apps/eb/BUSCO/5.8.3-foss-2023a
To use this version of BUSCO, please first load the module with
ml BUSCO/5.8.3-foss-2023a
BLAST+ v2.14.1 is loaded with this application. This version of Blast+ enables the multiple cores function for busco. AUGUSTUS v3.5.0 is also loaded with AUGUSTUS_CONFIG_PATH set correctly.
Before run the program, please copy the BUSCO config file config.ini to your current working folder and modify the input file value and other values as needed in it. Please also copy AUGUSTUS config folder to the place:
cp -r /apps/eb/AUGUSTUS/3.5.0-foss-2023a/config config_augustus export AUGUSTUS_CONFIG_PATH=config_augustus cp /apps/eb/BUSCO/5.8.3-foss-2023a/config/config.ini config.ini vim config.ini export BUSCO_CONFIG_FILE=config.ini
Example shell script sub.sh to run BUSCO/5.8.3 on the batch partition:
#!/bin/bash #SBATCH --job-name=busco # Job name #SBATCH --partition=batch # Partition (queue) name #SBATCH --ntasks=1 # Run a single task #SBATCH --cpus-per-task=4 # Number of CPU cores per task #SBATCH --mem=10gb # Job memory request #SBATCH --time=48:00:00 # Time limit hrs:min:sec #SBATCH --output=log.%j.out # Standard output log #SBATCH --error=log.%j.err # Standard error log #SBATCH --export=NONE # Don't export user's explicit env variables to compute node #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=username@uga.edu # Where to send mail cd $SLURM_SUBMIT_DIR ml BUSCO/5.8.3-foss-2023a # load BUSCO v5.8.3 module export AUGUSTUS_CONFIG_PATH=${PWD}/config_augustus export BUSCO_CONFIG_FILE=${PWD}/config.ini time busco --config ./config.ini --cpu 4 [options]
where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the time limit, maximum memory, number of cores, and the job name need to be modified appropriately as well.
Documentation
ml BUSCO/5.8.3-foss-2023a busco -h usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS] Welcome to BUSCO 5.8.3: the Benchmarking Universal Single-Copy Ortholog assessment tool. For more detailed usage information, please review the README file provided with this distribution and the BUSCO user guide. Visit this page https://gitlab.com/ezlab/busco#how-to-cite-busco to see how to cite BUSCO optional arguments: -i SEQUENCE_FILE, --in SEQUENCE_FILE Input sequence file in FASTA format. Can be an assembled genome or transcriptome (DNA), or protein sequences from an annotated gene set. Also possible to use a path to a directory containing multiple input files. -o OUTPUT, --out OUTPUT Give your analysis run a recognisable short name. Output folders and files will be labelled with this name. The path to the output folder is set with --out_path. -m MODE, --mode MODE Specify which BUSCO analysis mode to run. There are three valid modes: - geno or genome, for genome assemblies (DNA) - tran or transcriptome, for transcriptome assemblies (DNA) - prot or proteins, for annotated gene sets (protein) -l LINEAGE, --lineage_dataset LINEAGE Specify the name of the BUSCO lineage to be used. --augustus Use augustus gene predictor for eukaryote runs --augustus_parameters "--PARAM1=VALUE1,--PARAM2=VALUE2" Pass additional arguments to Augustus. All arguments should be contained within a single string with no white space, with each argument separated by a comma. --augustus_species AUGUSTUS_SPECIES Specify a species for Augustus training. --auto-lineage Run auto-lineage to find optimum lineage path --auto-lineage-euk Run auto-placement just on eukaryote tree to find optimum lineage path --auto-lineage-prok Run auto-lineage just on non-eukaryote trees to find optimum lineage path -c N, --cpu N Specify the number (N=integer) of threads/cores to use. --config CONFIG_FILE Provide a config file --contig_break n Number of contiguous Ns to signify a break between contigs. Default is n=10. --datasets_version DATASETS_VERSION Specify the version of BUSCO datasets, e.g. odb10, odb12 (default odb12) --download [dataset ...] Download dataset. Possible values are a specific dataset name, "all", "prokaryota", "eukaryota", or "virus". If used together with other command line arguments, make sure to place this last. --download_base_url DOWNLOAD_BASE_URL Set the url to the remote BUSCO dataset location --download_path DOWNLOAD_PATH Specify local filepath for storing BUSCO dataset downloads -e N, --evalue N E-value cutoff for BLAST searches. Allowed formats, 0.001 or 1e-03 (Default: 1e-03) -f, --force Force rewriting of existing files. Must be used when output files with the provided name already exist. -h, --help Show this help message and exit --limit N How many candidate regions (contig or transcript) to consider per BUSCO (default: 3) --list-datasets Print the list of available BUSCO datasets --long Optimization Augustus self-training mode (Default: Off); adds considerably to the run time, but can improve results for some non-model organisms --metaeuk Use Metaeuk gene predictor --metaeuk_parameters "--PARAM1=VALUE1,--PARAM2=VALUE2" Pass additional arguments to Metaeuk for the first run. All arguments should be contained within a single string with no white space, with each argument separated by a comma. --metaeuk_rerun_parameters "--PARAM1=VALUE1,--PARAM2=VALUE2" Pass additional arguments to Metaeuk for the second run. All arguments should be contained within a single string with no white space, with each argument separated by a comma. --miniprot Use Miniprot gene predictor --skip_bbtools Skip BBTools for assembly statistics --offline To indicate that BUSCO cannot attempt to download files --opt-out-run-stats Opt out of data collection. Information on the data collected is available in the user guide. --out_path OUTPUT_PATH Optional location for results folder, excluding results folder name. Default is current working directory. -q, --quiet Disable the info logs, displays only errors -r, --restart Continue a run that had already partially completed. --scaffold_composition Writes ACGTN content per scaffold to a file scaffold_composition.txt --tar Compress some subdirectories with many files to save space -v, --version Show this version and exit
Installation
Source code is obtained from BUSCO
System
64-bit Linux