Meraculous-Sapelo2: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 66: Line 66:
   
   
  run_meraculous.sh -c meraculous.standalone.config -dir output -cleanup_level 1
  run_meraculous.sh -c meraculous.standalone.config -dir output -cleanup_level 1
where the parameters of the job, such as the maximum wall clock time (--time), maximum memory (--mem), CPU cores (--cpus-per-task), and the job name (--job-name) need to be modified appropriately. In this example, the standard output and error of the run_meraculous.sh command will be saved into two files called log.JobID.out and log.JobID}.err, respectively, where jobID is the jobid number.
The parameters of the job, such as the maximum wall clock time (--time), maximum memory (--mem), CPU cores (--cpus-per-task), and the job name (--job-name) need to be modified appropriately. In this example, the standard output and error of the run_meraculous.sh command will be saved into two files called log.JobID.out and log.JobID}.err, respectively, where jobID is the jobid number.


   
   

Revision as of 13:39, 14 December 2022

Category

Bioinformatics

Program On

Sapelo2

Version

2.2.6

Author / Distributor

Please see https://jgi.doe.gov/data-and-tools/software-tools/meraculous/: "Meraculous is a whole genome assembler for Next Generation Sequencing data geared for large genomes."

Description

From https://sourceforge.net/projects/meraculous20/, "Meraculous-2D is a whole genome assembler for NGS reads (Illumina) that is capable of assembling large, diploid genomes with modest computational requirements.

Features include:

- Efficient k-mer counting and deBruijn graph traversal

- Two modes of handling of diploid allelic variation

- Improved scaffolding that produces more complete assemblies without compromising scaffolding accuracy."

Running Program

Also refer to Running Jobs on Sapelo2

For more information on Environment Modules on Sapelo2 please see the Lmod page.

  • Version 2.2.6, installed as Conda virtual environment in /apps/eb/Meraculous/2.2.6

To use this version of magma, please first load the module with

module load Meraculous/2.2.6

Please note:

  • To run Meraculous, in your current job working folder, you need to prepare a configuration file that contains the parameters guiding the entire assembly process. This configuration file must be passed to the program with the -c <configuration file> argument.
  • The assembly is driven by a perl pipeline which performs data fragmentation and load balancing, as well as submission and monitoring of multiple task arrays on a SLURM-type cluster (Sapelo2) or a standalone multi-core server.


Example of how to run Meraculous in a standalone multi-core server on batch

1. Create a configuration file in your current working folder. In the example below this file is called meraculous.standalone.config and its content is

#Describe the libraries ( one line per library )
lib_seq /scratch/zhuofei/meraculous/OT1_CKDN220054653-1A_HF33VDSX5_L1_R1_paired.fq,/scratch/zhuofei/meraculous/OT1_CKDN220054653-1A_HF33VDSX5_L1_R2_paired.fq GERMAC1 200 20 150 0 0 1 1 1 0 0

genome_size 2.15
mer_size 31
diploid_mode 2
num_prefix_blocks 4
min_depth_cutoff 3

use_cluster 0

2. Create a job submission script, called sub.sh in the example here, with the sample content

#!/bin/bash
#SBATCH --job-name=meraculoue_standalone
#SBATCH --partition=batch
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --mem=128gb
#SBATCH --time=7-00:00:00
#SBATCH --output=log.%j.out
#SBATCH --error=log.%j.err

cd $SLURM_SUBMIT_DIR

ml Meraculous/2.2.6

run_meraculous.sh -c meraculous.standalone.config -dir output -cleanup_level 1

The parameters of the job, such as the maximum wall clock time (--time), maximum memory (--mem), CPU cores (--cpus-per-task), and the job name (--job-name) need to be modified appropriately. In this example, the standard output and error of the run_meraculous.sh command will be saved into two files called log.JobID.out and log.JobID}.err, respectively, where jobID is the jobid number.


3. Submit the job to the queue with

sbatch sub.sh

Documentation

Tutorials and user guide are available at http://magma.maths.usyd.edu.au/magma/documentation/

Installation

System

64-bit Linux