Medusa-Sapelo2
Category
Bioinformatics
Program On
Sapelo2
Version
1.6
Author / Distributor
Please see https://github.com/combogenomics/medusa
Description
Medusa is a draft genome scaffolder that uses multiple reference genomes in a graph-based approach. For more information, please see https://github.com/combogenomics/medusa
Running Program
Also refer to Running Jobs on Sapelo2
For more information on Environment Modules on Sapelo2 please see the Lmod page.
- Version 1.6, installed in /apps/eb/Medusa/1.6-foss-2021b-Python-3.9.6-Java-1.8.0_241
To use this version of Medusa, please first load the module with
ml Medusa/1.6-foss-2021b-Python-3.9.6-Java-1.8.0_241
This module automatically loads Python/3.9.6-GCCcore-11.2.0, Biopython/1.79-foss-2021b, networkx/2.6.3-foss-2021b, MUMmer/4.0.0beta2-GCCcore-11.2.0, Java/1.8.0_241, and other dependencies.
Sample job submission script (sub.sh) to run medusa.jar:
#!/bin/bash
#SBATCH --job-name=medusajob
#SBATCH --partition=batch
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --cpus-per-task=6
#SBATCH --mem=10gb
#SBATCH --time=36:00:00
#SBATCH --output=%x.%j.out
#SBATCH --error=%x.%j.err
cd $SLURM_SUBMIT_DIR
ml Medusa/1.6-foss-2021b-Python-3.9.6-Java-1.8.0_241
java -jar ${EBROOTMEDUSA}/medusa.jar --threads 6 [options]
where [options] need to be replaced by the options (command and arguments) you want to use. Note that, Medusa scripts are installed in its medusa_scripts/ folder; if you want to use them, you need to specify its path by using its option of -scriptPath, i.e.,
-scriptPath ${EBROOTMEDUSA}/medusa_scripts
Other parameters of the job, such as the maximum wall clock time, maximum memory, the number of cores per node, and the job name need to be modified appropriately as well. In this example, the number of threads for the medusa.jar program is set to 6 and the number of cores requested for the job is also 6 (--cpus-per-task=6).
Submit the job to the queue with
sbatch sub.sh
Documentation
ml Medusa/1.6-foss-2021b-Python-3.9.6-Java-1.8.0_241
[cft07037@d2-13 ~]$ java -jar ${EBROOTMEDUSA}/medusa.jar -h
Medusa version 1.6
usage: java -jar medusa.jar -i inputfile -v
available options:
-d OPTIONAL PARAMETER;The option *-d*
allows for the estimation of the
distance between pairs of contigs
based on the reference genome(s):
in this case the scaffolded contigs
will be separated by a number of N
characters equal to this estimate.
The estimated distances are also
saved in the
<targetGenome>_distanceTable file.
By default the scaffolded contigs
are separated by 100 Ns
-f <<draftsFolder>> OPTIONAL PARAMETER; The option *-f*
is optional and indicates the path
to the comparison drafts folder
-gexf OPTIONAL PARAMETER;Conting network
and path cover are given in gexf
format.
-h Print this help and exist.
-i <<targetGenome>> REQUIRED PARAMETER;The option *-i*
indicates the name of the target
genome file.
-n50 <<fastaFile>> OPTIONAL PARAMETER; The option
*-n50* allows the calculation of
the N50 statistic on a FASTA file.
In this case the usage is the
following: java -jar medusa.jar
-n50 <name_of_the_fasta>. All the
other options will be ignored.
-o <<outputName>> OPTIONAL PARAMETER; The option *-o*
indicates the name of output fasta
file.
-random <<numberOfRounds>> OPTIONAL PARAMETER;The option
*-random* is available (not
required). This option allows the
user to run a given number of
cleaning rounds and keep the best
solution. Since the variability is
small 5 rounds are usually
sufficient to find the best score.
-scriptPath <<medusaScriptsFolder>> OPTIONAL PARAMETER; The folder
containing the medusa scripts.
Default value: medusa_scripts
-threads <<numberOfThreads>> OPTIONAL PARAMETER; The option
*-threads* indicates the number of
threads to be used with mummer
(requires version >= 4.0)
-v RECOMMENDED PARAMETER; The option
*-v* (recommended) print on console
the information given by the
package MUMmer. This option is
strongly suggested to understand if
MUMmer is not running properly.
-w2 OPTIONAL PARAMETER;The option *-w2*
is optional and allows for a
sequence similarity based weighting
scheme. Using a different weighting
scheme may lead to better results.
Installation
- Version 1.6, source code downloaded from https://github.com/combogenomics/medusa and compiled with ant
System
64-bit Linux