Meraculous-Sapelo2: Difference between revisions
No edit summary |
No edit summary |
||
(7 intermediate revisions by the same user not shown) | |||
Line 26: | Line 26: | ||
*Version 2.2.6, installed as Conda virtual environment in /apps/eb/Meraculous/2.2.6 | *Version 2.2.6, installed as Conda virtual environment in /apps/eb/Meraculous/2.2.6 | ||
To use this version of | To use this version of Meraculous, please first load the module with | ||
<pre class="gscript"> | <pre class="gscript"> | ||
module load Meraculous/2.2.6 | module load Meraculous/2.2.6 | ||
Line 49: | Line 49: | ||
use_cluster 0 | use_cluster 0 | ||
2. Create a job submission script, called sub.sh in the example here, with the sample content: | 2. Create a job submission script, called sub.sh in the example here, with the sample content: | ||
#!/bin/bash | #!/bin/bash | ||
Line 84: | Line 86: | ||
min_depth_cutoff 3 | min_depth_cutoff 3 | ||
use_cluster | use_cluster 1 | ||
cluster_num_nodes 10 | |||
cluster_slots_per_task 8 | |||
cluster_ram_request 128 | |||
cluster_walltime 168:00:00 | |||
cluster_queue batch | |||
where '''use_cluster 1''' specifies to use a cluster for job submissions. | |||
'''cluster_num_nodes''' is specifies the number of available cluster compute nodes. This number can be approximate (mentioned in Meraculous user manual). In test, we observed some pipeline steps (e.g., gapCloser) can use more nodes (e.g. 15) than the number you specified here (e.g. 10). | |||
'''cluster_slots_per_task''' specifies the maximum number of CPU cores to be allocated for multi-threaded elements in task arrays | |||
'''cluster_ram_request''' specifies the maximum amount of memory (GB) to be allocated for each element in task arrays; If elements of the task array are multi-threaded, Meraculous will automatically divide this number by the number of allocated CPU cores. In the example given above, at some pipeline steps, each CPU core can get 16 GB memory, that is (the number of cluster_ram_request (128)) / (the number of cluster_slots_per_task (8)) = 16. | |||
'''cluster_walltime''' specifies the wall-clock time limit for cluster tasks. It must be specified as hh:mm:ss (168 hours is the upper limit for Sapelo2 batch partition). | |||
'''cluster_queue''' specifies the name of partition to which cluster jobs will be dispatched. | |||
2. Create a job submission script, called sub.sh in the example here, with the sample content: | 2. Create a job submission script, called sub.sh in the example here, with the sample content: | ||
#!/bin/bash | #!/bin/bash | ||
#SBATCH --job-name= | #SBATCH --job-name=meraculoue_cluster | ||
#SBATCH --partition=batch | #SBATCH --partition=batch | ||
#SBATCH --nodes=1 | #SBATCH --nodes=1 | ||
#SBATCH --ntasks=1 | #SBATCH --ntasks=1 | ||
#SBATCH --cpus-per-task= | #SBATCH --ntasks-per-node=1 | ||
#SBATCH --cpus-per-task=4 | |||
#SBATCH --mem=128gb | #SBATCH --mem=128gb | ||
#SBATCH --time= | #SBATCH --time=168:00:00 | ||
#SBATCH --output=log.%j.out | #SBATCH --output=log.%j.out | ||
#SBATCH --error=log.%j.err | #SBATCH --error=log.%j.err | ||
Line 101: | Line 123: | ||
ml Meraculous/2.2.6 | ml Meraculous/2.2.6 | ||
run_meraculous.sh -c meraculous. | export SLURM_ROOT=/opt/apps/slurm/21.08.8 | ||
run_meraculous.sh -c meraculous.cluster.config -dir output -cleanup_level 1 | |||
The parameters of the job, such as the maximum wall clock time (--time), maximum memory (--mem), CPU cores (--cpus-per-task), and the job name (--job-name) need to be modified appropriately. In this example, the standard output and error of the run_meraculous.sh command will be saved into two files called log.JobID.out and log.JobID.err, respectively, where jobID is the jobid number. | The parameters of the job, such as the maximum wall clock time (--time), maximum memory (--mem), CPU cores (--cpus-per-task), and the job name (--job-name) need to be modified appropriately. In this example, the standard output and error of the run_meraculous.sh command will be saved into two files called log.JobID.out and log.JobID.err, respectively, where jobID is the jobid number. | ||
Line 111: | Line 136: | ||
=== Documentation === | === Documentation === | ||
User guide Manual.pdf is available at /apps/eb/Meraculous/2.2.6/share/doc/meraculous/Manual.pdf . Please feel free to download it to your local computer to browse. | |||
=== Installation === | === Installation === | ||
* | *https://anaconda.org/bioconda/meraculous | ||
=== System === | === System === | ||
64-bit Linux | 64-bit Linux |
Latest revision as of 13:57, 14 December 2022
Category
Bioinformatics
Program On
Sapelo2
Version
2.2.6
Author / Distributor
Please see https://jgi.doe.gov/data-and-tools/software-tools/meraculous/: "Meraculous is a whole genome assembler for Next Generation Sequencing data geared for large genomes."
Description
From https://sourceforge.net/projects/meraculous20/, "Meraculous-2D is a whole genome assembler for NGS reads (Illumina) that is capable of assembling large, diploid genomes with modest computational requirements.
Features include:
- Efficient k-mer counting and deBruijn graph traversal
- Two modes of handling of diploid allelic variation
- Improved scaffolding that produces more complete assemblies without compromising scaffolding accuracy."
Running Program
Also refer to Running Jobs on Sapelo2
For more information on Environment Modules on Sapelo2 please see the Lmod page.
- Version 2.2.6, installed as Conda virtual environment in /apps/eb/Meraculous/2.2.6
To use this version of Meraculous, please first load the module with
module load Meraculous/2.2.6
Please note:
- To run Meraculous, in your current job working folder, you need to prepare a configuration file that contains the parameters guiding the entire assembly process. This configuration file must be passed to the program with the -c <configuration file> argument.
- The assembly is driven by a perl pipeline which performs data fragmentation and load balancing, as well as submission and monitoring of multiple task arrays on a SLURM-type cluster (Sapelo2) or a standalone multi-core server.
Example of how to run Meraculous in a standalone multi-core server on batch
1. Create a configuration file in your current working folder. In the example below this file is called meraculous.standalone.config and its content is
#Describe the libraries ( one line per library ) lib_seq /scratch/zhuofei/meraculous/OT1_CKDN220054653-1A_HF33VDSX5_L1_R1_paired.fq,/scratch/zhuofei/meraculous/OT1_CKDN220054653-1A_HF33VDSX5_L1_R2_paired.fq GERMAC1 200 20 150 0 0 1 1 1 0 0 genome_size 2.15 mer_size 31 diploid_mode 2 num_prefix_blocks 4 min_depth_cutoff 3 use_cluster 0
2. Create a job submission script, called sub.sh in the example here, with the sample content:
#!/bin/bash #SBATCH --job-name=meraculoue_standalone #SBATCH --partition=batch #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=16 #SBATCH --mem=128gb #SBATCH --time=7-00:00:00 #SBATCH --output=log.%j.out #SBATCH --error=log.%j.err cd $SLURM_SUBMIT_DIR ml Meraculous/2.2.6 run_meraculous.sh -c meraculous.standalone.config -dir output -cleanup_level 1
The parameters of the job, such as the maximum wall clock time (--time), maximum memory (--mem), CPU cores (--cpus-per-task), and the job name (--job-name) need to be modified appropriately. In this example, the standard output and error of the run_meraculous.sh command will be saved into two files called log.JobID.out and log.JobID.err, respectively, where jobID is the jobid number.
3. Submit the job to the queue with
sbatch sub.sh
Example of how to run Meraculous cluster mode (multiple task arrays) on batch
1. Create a configuration file in your current working folder. In the example below this file is called meraculous.cluster.config and its content is
#Describe the libraries ( one line per library ) lib_seq /scratch/zhuofei/meraculous/OT1_CKDN220054653-1A_HF33VDSX5_L1_R1_paired.fq,/scratch/zhuofei/meraculous/OT1_CKDN220054653-1A_HF33VDSX5_L1_R2_paired.fq GERMAC1 200 20 150 0 0 1 1 1 0 0 genome_size 2.15 mer_size 31 diploid_mode 2 num_prefix_blocks 4 min_depth_cutoff 3 use_cluster 1 cluster_num_nodes 10 cluster_slots_per_task 8 cluster_ram_request 128 cluster_walltime 168:00:00 cluster_queue batch
where use_cluster 1 specifies to use a cluster for job submissions.
cluster_num_nodes is specifies the number of available cluster compute nodes. This number can be approximate (mentioned in Meraculous user manual). In test, we observed some pipeline steps (e.g., gapCloser) can use more nodes (e.g. 15) than the number you specified here (e.g. 10).
cluster_slots_per_task specifies the maximum number of CPU cores to be allocated for multi-threaded elements in task arrays
cluster_ram_request specifies the maximum amount of memory (GB) to be allocated for each element in task arrays; If elements of the task array are multi-threaded, Meraculous will automatically divide this number by the number of allocated CPU cores. In the example given above, at some pipeline steps, each CPU core can get 16 GB memory, that is (the number of cluster_ram_request (128)) / (the number of cluster_slots_per_task (8)) = 16.
cluster_walltime specifies the wall-clock time limit for cluster tasks. It must be specified as hh:mm:ss (168 hours is the upper limit for Sapelo2 batch partition).
cluster_queue specifies the name of partition to which cluster jobs will be dispatched.
2. Create a job submission script, called sub.sh in the example here, with the sample content:
#!/bin/bash #SBATCH --job-name=meraculoue_cluster #SBATCH --partition=batch #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=4 #SBATCH --mem=128gb #SBATCH --time=168:00:00 #SBATCH --output=log.%j.out #SBATCH --error=log.%j.err cd $SLURM_SUBMIT_DIR ml Meraculous/2.2.6 export SLURM_ROOT=/opt/apps/slurm/21.08.8 run_meraculous.sh -c meraculous.cluster.config -dir output -cleanup_level 1
The parameters of the job, such as the maximum wall clock time (--time), maximum memory (--mem), CPU cores (--cpus-per-task), and the job name (--job-name) need to be modified appropriately. In this example, the standard output and error of the run_meraculous.sh command will be saved into two files called log.JobID.out and log.JobID.err, respectively, where jobID is the jobid number.
3. Submit the job to the queue with
sbatch sub.sh
Documentation
User guide Manual.pdf is available at /apps/eb/Meraculous/2.2.6/share/doc/meraculous/Manual.pdf . Please feel free to download it to your local computer to browse.
Installation
System
64-bit Linux