Medusa-Sapelo2

From Research Computing Center Wiki
Jump to navigation Jump to search

Category

Bioinformatics

Program On

Sapelo2

Version

1.6

Author / Distributor

Please see https://github.com/combogenomics/medusa

Description

Medusa is a draft genome scaffolder that uses multiple reference genomes in a graph-based approach. For more information, please see https://github.com/combogenomics/medusa

Running Program

Also refer to Running Jobs on Sapelo2

For more information on Environment Modules on Sapelo2 please see the Lmod page.

  • Version 1.6, installed in /apps/eb/Medusa/1.6-foss-2019b-Python-3.7.4-Java-1.8.0_144

To use this version of Medusa, please first load the module with

ml Medusa/1.6-foss-2019b-Python-3.7.4-Java-1.8.0_144

This module automatically loads Python/3.7.4-GCCcore-8.3.0, Biopython/1.75-foss-2019b-Python-3.7.4, networkx/2.4-foss-2019b-Python-3.7.4, MUMmer/4.0.0beta2-foss-2019b, Java/1.8.0_144, and other dependencies.

Sample job submission script (sub.sh) to run medusa.jar:

#!/bin/bash
#SBATCH --job-name=medusajob         
#SBATCH --partition=batch            
#SBATCH --ntasks=1                
#SBATCH --nodes=1
#SBATCH --cpus-per-task=6
#SBATCH --mem=10gb                     
#SBATCH --time=36:00:00            
#SBATCH --output=%x.%j.out   
#SBATCH --error=%x.%j.err   

cd $SLURM_SUBMIT_DIR

ml Medusa/1.6-foss-2019b-Python-3.7.4-Java-1.8.0_144

java -jar ${EBROOTMEDUSA}/medusa.jar --threads 6 [options]

where [options] need to be replaced by the options (command and arguments) you want to use. Note that, Medusa scripts are installed in its medusa_scripts/ folder; if you want to use them, you need to specify its path by using its option of -scriptPath, i.e.,

-scriptPath ${EBROOTMEDUSA}/medusa_scripts

Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well. In this example, the number of threads for the medusa.jar program is set to 6 and the number of cores requested for the job is also 6 (--cpus-per-task=6).


Submit the job to the queue with

sbatch sub.sh

Documentation

ml Medusa/1.6-foss-2019b-Python-3.7.4-Java-1.8.0_144

java -jar ${EBROOTMEDUSA}/medusa.jar  -h
Medusa version 1.6
usage: java -jar medusa.jar -i inputfile -v
available options:
 -d                                    OPTIONAL PARAMETER;The option *-d*
                                       allows for the estimation of the
                                       distance between pairs of contigs
                                       based on the reference genome(s):
                                       in this case the scaffolded contigs
                                       will be separated by a number of N
                                       characters equal to this estimate.
                                       The estimated distances are also
                                       saved in the
                                       <targetGenome>_distanceTable file.
                                       By default the scaffolded contigs
                                       are separated by 100 Ns
 -f <<draftsFolder>>                   OPTIONAL PARAMETER; The option *-f*
                                       is optional and indicates the path
                                       to the comparison drafts folder
 -gexf                                 OPTIONAL PARAMETER;Conting network
                                       and path cover are given in gexf
                                       format.
 -h                                    Print this help and exist.
 -i <<targetGenome>>                   REQUIRED PARAMETER;The option *-i*
                                       indicates the name of the target
                                       genome file.
 -n50 <<fastaFile>>                    OPTIONAL PARAMETER; The option
                                       *-n50* allows the calculation of
                                       the N50 statistic on a FASTA file.
                                       In this case the usage is the
                                       following: java -jar medusa.jar
                                       -n50 <name_of_the_fasta>. All the
                                       other options will be ignored.
 -o <<outputName>>                     OPTIONAL PARAMETER; The option *-o*
                                       indicates the name of output fasta
                                       file.
 -random <<numberOfRounds>>            OPTIONAL PARAMETER;The option
                                       *-random* is available (not
                                       required). This option allows the
                                       user to run a given number of
                                       cleaning rounds and keep the best
                                       solution. Since the variability is
                                       small 5 rounds are usually
                                       sufficient to find the best score.
 -scriptPath <<medusaScriptsFolder>>   OPTIONAL PARAMETER; The folder
                                       containing the medusa scripts.
                                       Default value: medusa_scripts
 -threads <<numberOfThreads>>          OPTIONAL PARAMETER; The option
                                       *-threads* indicates the number of
                                       threads to be used with mummer
                                       (requires version >= 4.0)
 -v                                    RECOMMENDED PARAMETER; The option
                                       *-v* (recommended) print on console
                                       the information given by the
                                       package MUMmer. This option is
                                       strongly suggested to understand if
                                       MUMmer is not running properly.
 -w2                                   OPTIONAL PARAMETER;The option *-w2*
                                       is optional and allows for a
                                       sequence similarity based weighting
                                       scheme. Using a different weighting
                                       scheme may lead to better results.


Installation

System

64-bit Linux