BUSCO-Sapelo2: Difference between revisions
No edit summary |
|||
Line 9: | Line 9: | ||
=== Version === | === Version === | ||
4.0.5, 4.0 | 4.0.5, 5.4.7, 5.5.0 | ||
=== Author / Distributor === | === Author / Distributor === | ||
Line 21: | Line 21: | ||
=== Running Program === | === Running Program === | ||
==== Version 4. | ==== Version 5.4.7 ==== | ||
* Version 4. | * Version 5.4.7, is installed at /apps/eb/BUSCO/5.4.7-foss-2022a | ||
BLAST+ v2. | BLAST+ v2.13.0 is loaded with this application. This version of Blast+ enables the multiple cores function for busco. AUGUSTUS v3.5.0 is also loaded with AUGUSTUS_CONFIG_PATH set correctly. | ||
To use this version of BUSCO, please load the module with | To use this version of BUSCO, please load the module with | ||
<pre class="gscript"> | <pre class="gscript"> | ||
ml BUSCO/4. | ml BUSCO/5.4.7-foss-2022a | ||
</pre> | </pre> | ||
Line 35: | Line 35: | ||
<pre class="gscript"> | <pre class="gscript"> | ||
cp -r /apps/eb/AUGUSTUS/3. | cp -r /apps/eb/AUGUSTUS/3.5.0-foss-2022a/config config_augustus | ||
export AUGUSTUS_CONFIG_PATH=config_augustus | export AUGUSTUS_CONFIG_PATH=config_augustus | ||
cp /apps/eb/BUSCO/4. | cp /apps/eb/BUSCO/5.4.7-foss-2022a/config/config.ini config.ini | ||
vim config.ini | vim config.ini | ||
export BUSCO_CONFIG_FILE=config.ini | export BUSCO_CONFIG_FILE=config.ini | ||
</pre> | </pre> | ||
==== Version | ==== Version 5.5.0 ==== | ||
* Version | * Version 5.5.0, is installed at /apps/eb/BUSCO/5.5.0-foss-2022a | ||
To use this version of BUSCO, please first load the module with | To use this version of BUSCO, please first load the module with | ||
<pre class="gcommand"> | <pre class="gcommand"> | ||
ml BUSCO/ | ml BUSCO/5.5.0-foss-2022a | ||
</pre> | </pre> | ||
BLAST+ v2. | BLAST+ v2.13.0 is loaded with this application. This version of Blast+ enables the multiple cores function for busco. AUGUSTUS v3.5.0 is also loaded with AUGUSTUS_CONFIG_PATH set correctly. | ||
Before run the program, please copy the BUSCO config file config.ini to your current working folder and modify the input file value and other values as needed in it. Please also copy AUGUSTUS config folder to the place: | Before run the program, please copy the BUSCO config file config.ini to your current working folder and modify the input file value and other values as needed in it. Please also copy AUGUSTUS config folder to the place: | ||
<pre class="gscript"> | <pre class="gscript"> | ||
cp -r /apps/eb/AUGUSTUS/3. | cp -r /apps/eb/AUGUSTUS/3.5.0-foss-2022a/config config_augustus | ||
export AUGUSTUS_CONFIG_PATH=config_augustus | export AUGUSTUS_CONFIG_PATH=config_augustus | ||
cp /apps/eb/BUSCO/ | cp /apps/eb/BUSCO/5.5.0-foss-2022a/config/config.ini config.ini | ||
vim config.ini | vim config.ini | ||
export BUSCO_CONFIG_FILE=config.ini | export BUSCO_CONFIG_FILE=config.ini | ||
Line 66: | Line 66: | ||
'''Example shell script''' sub.sh to run BUSCO/ | |||
'''Example shell script''' sub.sh to run BUSCO/5.5.0 on the batch partition: | |||
<pre class="gscript"> | <pre class="gscript"> | ||
#!/bin/bash | #!/bin/bash | ||
Line 83: | Line 84: | ||
cd $SLURM_SUBMIT_DIR | cd $SLURM_SUBMIT_DIR | ||
ml BUSCO/ | ml BUSCO/5.5.0-foss-2022a # load BUSCO v5.5.0 module | ||
export AUGUSTUS_CONFIG_PATH=${PWD}/config_augustus | export AUGUSTUS_CONFIG_PATH=${PWD}/config_augustus | ||
Line 173: | Line 174: | ||
=== Documentation === | === Documentation === | ||
<pre | <pre class="gcommand"> | ||
ml BUSCO/ | [cft07037@b1-24 bin]$ ml BUSCO/5.5.0-foss-2022a | ||
busco -h | [cft07037@b1-24 bin]$ busco -h | ||
usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS] | usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS] | ||
Welcome to BUSCO | Welcome to BUSCO 5.5.0: the Benchmarking Universal Single-Copy Ortholog assessment tool. | ||
For more detailed usage information, please review the README file provided with this distribution and the BUSCO user guide. | For more detailed usage information, please review the README file provided with this distribution and the BUSCO user guide. Visit this page https://gitlab.com/ezlab/busco#how-to-cite-busco to see how to cite BUSCO | ||
optional arguments: | optional arguments: | ||
-i | -i SEQUENCE_FILE, --in SEQUENCE_FILE | ||
Input sequence file in FASTA format. Can be an assembled genome or transcriptome (DNA), or protein sequences from an annotated gene set. | Input sequence file in FASTA format. Can be an assembled genome or transcriptome (DNA), or protein sequences from an annotated gene set. Also possible to use a path to a directory containing multiple input files. | ||
-o OUTPUT, --out OUTPUT | -o OUTPUT, --out OUTPUT | ||
Give your analysis run a recognisable short name. Output folders and files will be labelled with this name. | Give your analysis run a recognisable short name. Output folders and files will be labelled with this name. The path to the output folder is set with --out_path. | ||
-m MODE, --mode MODE Specify which BUSCO analysis mode to run. | -m MODE, --mode MODE Specify which BUSCO analysis mode to run. | ||
There are three valid modes: | There are three valid modes: | ||
Line 198: | Line 194: | ||
-l LINEAGE, --lineage_dataset LINEAGE | -l LINEAGE, --lineage_dataset LINEAGE | ||
Specify the name of the BUSCO lineage to be used. | Specify the name of the BUSCO lineage to be used. | ||
- | --augustus Use augustus gene predictor for eukaryote runs | ||
-- | --augustus_parameters --PARAM1=VALUE1,--PARAM2=VALUE2 | ||
Pass additional arguments to Augustus. All arguments should be contained within a single string with no white space, with each argument separated by a comma. | |||
Pass additional arguments to Augustus. All arguments should be contained within a single | |||
--augustus_species AUGUSTUS_SPECIES | --augustus_species AUGUSTUS_SPECIES | ||
Specify a species for Augustus training. | Specify a species for Augustus training. | ||
--auto-lineage Run auto-lineage to find optimum lineage path | --auto-lineage Run auto-lineage to find optimum lineage path | ||
--auto-lineage-euk Run auto-placement just on eukaryote tree to find optimum lineage path | |||
--auto-lineage-prok Run auto-lineage just on non-eukaryote trees to find optimum lineage path | --auto-lineage-prok Run auto-lineage just on non-eukaryote trees to find optimum lineage path | ||
-- | -c N, --cpu N Specify the number (N=integer) of threads/cores to use. | ||
--config CONFIG_FILE Provide a config file | --config CONFIG_FILE Provide a config file | ||
- | --contig_break n Number of contiguous Ns to signify a break between contigs. Default is n=10. | ||
--datasets_version DATASETS_VERSION | |||
Specify the version of BUSCO datasets, e.g. odb10 | |||
--download [dataset ...] | |||
Download dataset. Possible values are a specific dataset name, "all", "prokaryota", "eukaryota", or "virus". If used together with other command line arguments, make sure to place this last. | |||
--download_base_url DOWNLOAD_BASE_URL | |||
Set the url to the remote BUSCO dataset location | |||
--download_path DOWNLOAD_PATH | |||
Specify local filepath for storing BUSCO dataset downloads | |||
-e N, --evalue N E-value cutoff for BLAST searches. Allowed formats, 0.001 or 1e-03 (Default: 1e-03) | |||
-f, --force Force rewriting of existing files. Must be used when output files with the provided name already exist. | |||
-h, --help Show this help message and exit | -h, --help Show this help message and exit | ||
--limit N How many candidate regions (contig or transcript) to consider per BUSCO (default: 3) | |||
--list-datasets Print the list of available BUSCO datasets | --list-datasets Print the list of available BUSCO datasets | ||
--long Optimization Augustus self-training mode (Default: Off); adds considerably to the run time, but can improve results for some non-model organisms | |||
--metaeuk_parameters "--PARAM1=VALUE1,--PARAM2=VALUE2" | |||
Pass additional arguments to Metaeuk for the first run. All arguments should be contained within a single string with no white space, with each argument separated by a comma. | |||
--metaeuk_rerun_parameters "--PARAM1=VALUE1,--PARAM2=VALUE2" | |||
Pass additional arguments to Metaeuk for the second run. All arguments should be contained within a single string with no white space, with each argument separated by a comma. | |||
--miniprot Use miniprot gene predictor for eukaryote runs | |||
--offline To indicate that BUSCO cannot attempt to download files | |||
--out_path OUTPUT_PATH | |||
Optional location for results folder, excluding results folder name. Default is current working directory. | |||
-q, --quiet Disable the info logs, displays only errors | |||
-r, --restart Continue a run that had already partially completed. | |||
--scaffold_composition | |||
Writes ACGTN content per scaffold to a file scaffold_composition.txt | |||
--tar Compress some subdirectories with many files to save space | |||
--update-data Download and replace with last versions all lineages datasets and files necessary to their automated selection | |||
-v, --version Show this version and exit | |||
</pre> | </pre> | ||
[[#top|Back to Top]] | [[#top|Back to Top]] |
Revision as of 09:19, 9 May 2024
Category
Bioinformatics
Program On
Sapelo2
Version
4.0.5, 5.4.7, 5.5.0
Author / Distributor
Description
"BUSCO - Benchmarking sets of Universal Single-Copy Orthologs." More details are at BUSCO
Running Program
Version 5.4.7
- Version 5.4.7, is installed at /apps/eb/BUSCO/5.4.7-foss-2022a
BLAST+ v2.13.0 is loaded with this application. This version of Blast+ enables the multiple cores function for busco. AUGUSTUS v3.5.0 is also loaded with AUGUSTUS_CONFIG_PATH set correctly.
To use this version of BUSCO, please load the module with
ml BUSCO/5.4.7-foss-2022a
Before run the program, please copy the BUSCO config file config.ini to your current working folder and modify the input file value and other values as needed in it. Please also copy AUGUSTUS config folder to the place:
cp -r /apps/eb/AUGUSTUS/3.5.0-foss-2022a/config config_augustus export AUGUSTUS_CONFIG_PATH=config_augustus cp /apps/eb/BUSCO/5.4.7-foss-2022a/config/config.ini config.ini vim config.ini export BUSCO_CONFIG_FILE=config.ini
Version 5.5.0
- Version 5.5.0, is installed at /apps/eb/BUSCO/5.5.0-foss-2022a
To use this version of BUSCO, please first load the module with
ml BUSCO/5.5.0-foss-2022a
BLAST+ v2.13.0 is loaded with this application. This version of Blast+ enables the multiple cores function for busco. AUGUSTUS v3.5.0 is also loaded with AUGUSTUS_CONFIG_PATH set correctly.
Before run the program, please copy the BUSCO config file config.ini to your current working folder and modify the input file value and other values as needed in it. Please also copy AUGUSTUS config folder to the place:
cp -r /apps/eb/AUGUSTUS/3.5.0-foss-2022a/config config_augustus export AUGUSTUS_CONFIG_PATH=config_augustus cp /apps/eb/BUSCO/5.5.0-foss-2022a/config/config.ini config.ini vim config.ini export BUSCO_CONFIG_FILE=config.ini
Example shell script sub.sh to run BUSCO/5.5.0 on the batch partition:
#!/bin/bash #SBATCH --job-name=busco # Job name #SBATCH --partition=batch # Partition (queue) name #SBATCH --ntasks=1 # Run a single task #SBATCH --cpus-per-task=4 # Number of CPU cores per task #SBATCH --mem=10gb # Job memory request #SBATCH --time=48:00:00 # Time limit hrs:min:sec #SBATCH --output=log.%j.out # Standard output log #SBATCH --error=log.%j.err # Standard error log #SBATCH --export=NONE # Don't export user's explicit env variables to compute node #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=username@uga.edu # Where to send mail cd $SLURM_SUBMIT_DIR ml BUSCO/5.5.0-foss-2022a # load BUSCO v5.5.0 module export AUGUSTUS_CONFIG_PATH=${PWD}/config_augustus export BUSCO_CONFIG_FILE=${PWD}/config.ini time busco --config ./config.ini --cpu 4 [options]
where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the time limit, maximum memory, number of cores, and the job name need to be modified appropriately as well.
Version 4.0.5, Singularity Image
- Version 4.0.5, is installed as a singularity image at /apps/singularity-images/busco-4.0.5.simg
To run this singularity image:
singularity exec /apps/singularity-images/busco-4.0.5.simg run_busco [options]
To get busco help info:
singularity exec /apps/singularity-images/busco-4.0.5.simg run_busco -h
To check busco version info:
singularity exec /apps/singularity-images/busco-4.0.5.simg run_busco -v
To have a check on other programs included in this busco singularity image:
singularity exec /apps/singularity-images/busco-4.0.5.simg ls /usr/local/bin singularity exec /apps/singularity-images/busco-4.0.5.simg ls /augustus singularity exec /apps/singularity-images/busco-4.0.5.simg ls /ncbi-blast-2.2.31+/bin singularity exec /apps/singularity-images/busco-4.0.5.simg ls /hmmer-3.2.1 singularity exec /apps/singularity-images/busco-4.0.5.simg ls /prodigal
Before run busco singularity container, please copy AUGUSTUS config folder to your current working folder:
cp -r /apps/eb/AUGUSTUS/3.3.3-foss-2019b/config config_augustus export AUGUSTUS_CONFIG_PATH=config_augustus
Please also copy the BUSCO config file config.ini from its singularity image to your current working folder and modify the input file value and other values as needed in it.
singularity exec /apps/singularity-images/busco-4.0.5.simg cp /busco/config/config.ini . vim config.ini export BUSCO_CONFIG_FILE=config.ini
Example shell script sub.sh to run BUSCO/4.0.5 singularity container:
#!/bin/bash #SBATCH --job-name=busco # Job name #SBATCH --partition=batch # Partition (queue) name #SBATCH --ntasks=1 # Run a single task #SBATCH --cpus-per-task=4 # Number of CPU cores per task #SBATCH --mem=10gb # Job memory request #SBATCH --time=24:00:00 # Time limit hrs:min:sec #SBATCH --output=log.%j.out # Standard output log #SBATCH --error=log.%j.err # Standard error log #SBATCH --export=NONE # Don't export user's explicit env variables to compute node #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=username@uga.edu # Where to send mail cd $SLURM_SUBMIT_DIR export AUGUSTUS_CONFIG_PATH=${PWD}/config_augustus export BUSCO_CONFIG_FILE=${PWD}/config.ini time singularity exec --bind ./config_augustus/:/augustus/config /apps/singularity-images/busco-4.0.5.simg run_busco --config ./config.ini --cpu 4 [options]
where [options] need to be replaced by the options (command and arguments) you want to use. Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well.
Here is an example of job submission command:
sbatch sub.sh
Documentation
[cft07037@b1-24 bin]$ ml BUSCO/5.5.0-foss-2022a [cft07037@b1-24 bin]$ busco -h usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS] Welcome to BUSCO 5.5.0: the Benchmarking Universal Single-Copy Ortholog assessment tool. For more detailed usage information, please review the README file provided with this distribution and the BUSCO user guide. Visit this page https://gitlab.com/ezlab/busco#how-to-cite-busco to see how to cite BUSCO optional arguments: -i SEQUENCE_FILE, --in SEQUENCE_FILE Input sequence file in FASTA format. Can be an assembled genome or transcriptome (DNA), or protein sequences from an annotated gene set. Also possible to use a path to a directory containing multiple input files. -o OUTPUT, --out OUTPUT Give your analysis run a recognisable short name. Output folders and files will be labelled with this name. The path to the output folder is set with --out_path. -m MODE, --mode MODE Specify which BUSCO analysis mode to run. There are three valid modes: - geno or genome, for genome assemblies (DNA) - tran or transcriptome, for transcriptome assemblies (DNA) - prot or proteins, for annotated gene sets (protein) -l LINEAGE, --lineage_dataset LINEAGE Specify the name of the BUSCO lineage to be used. --augustus Use augustus gene predictor for eukaryote runs --augustus_parameters --PARAM1=VALUE1,--PARAM2=VALUE2 Pass additional arguments to Augustus. All arguments should be contained within a single string with no white space, with each argument separated by a comma. --augustus_species AUGUSTUS_SPECIES Specify a species for Augustus training. --auto-lineage Run auto-lineage to find optimum lineage path --auto-lineage-euk Run auto-placement just on eukaryote tree to find optimum lineage path --auto-lineage-prok Run auto-lineage just on non-eukaryote trees to find optimum lineage path -c N, --cpu N Specify the number (N=integer) of threads/cores to use. --config CONFIG_FILE Provide a config file --contig_break n Number of contiguous Ns to signify a break between contigs. Default is n=10. --datasets_version DATASETS_VERSION Specify the version of BUSCO datasets, e.g. odb10 --download [dataset ...] Download dataset. Possible values are a specific dataset name, "all", "prokaryota", "eukaryota", or "virus". If used together with other command line arguments, make sure to place this last. --download_base_url DOWNLOAD_BASE_URL Set the url to the remote BUSCO dataset location --download_path DOWNLOAD_PATH Specify local filepath for storing BUSCO dataset downloads -e N, --evalue N E-value cutoff for BLAST searches. Allowed formats, 0.001 or 1e-03 (Default: 1e-03) -f, --force Force rewriting of existing files. Must be used when output files with the provided name already exist. -h, --help Show this help message and exit --limit N How many candidate regions (contig or transcript) to consider per BUSCO (default: 3) --list-datasets Print the list of available BUSCO datasets --long Optimization Augustus self-training mode (Default: Off); adds considerably to the run time, but can improve results for some non-model organisms --metaeuk_parameters "--PARAM1=VALUE1,--PARAM2=VALUE2" Pass additional arguments to Metaeuk for the first run. All arguments should be contained within a single string with no white space, with each argument separated by a comma. --metaeuk_rerun_parameters "--PARAM1=VALUE1,--PARAM2=VALUE2" Pass additional arguments to Metaeuk for the second run. All arguments should be contained within a single string with no white space, with each argument separated by a comma. --miniprot Use miniprot gene predictor for eukaryote runs --offline To indicate that BUSCO cannot attempt to download files --out_path OUTPUT_PATH Optional location for results folder, excluding results folder name. Default is current working directory. -q, --quiet Disable the info logs, displays only errors -r, --restart Continue a run that had already partially completed. --scaffold_composition Writes ACGTN content per scaffold to a file scaffold_composition.txt --tar Compress some subdirectories with many files to save space --update-data Download and replace with last versions all lineages datasets and files necessary to their automated selection -v, --version Show this version and exit
Installation
Source code is obtained from BUSCO
System
64-bit Linux