Cromwell-Sapelo2: Difference between revisions
(Added "workflow CromwellBowtie2" to WDL segment in "inputs" section for added clarity) |
(Formatting) |
||
Line 153: | Line 153: | ||
"CromwellBowtie2.cpus_per_task": "8" | "CromwellBowtie2.cpus_per_task": "8" | ||
} | } | ||
</syntaxhighlight><code>--inputs input.json</code> | </syntaxhighlight>'''<u>Usage:</u>''' <code>--inputs input.json</code> | ||
In an input file, variables are referenced using the '''workflow name''' followed by a '''period''' and the '''variable name''' <code>CromwellBowtie2.input_fq</code>. | In an input file, variables are referenced using the '''workflow name''' followed by a '''period''' and the '''variable name''' <code>CromwellBowtie2.input_fq</code>. | ||
Line 174: | Line 174: | ||
"use_relative_output_paths": true | "use_relative_output_paths": true | ||
} | } | ||
</syntaxhighlight><code>--options options.json</code> | </syntaxhighlight>'''<u>Usage:</u>''' <code>--options options.json</code> | ||
Without specifying an alternative output directory, the output would be in a location similar to the following: | Without specifying an alternative output directory, the output would be in a location similar to the following: | ||
<code>./cromwell-executions/CromwellBowtie2/04d44744-2a84-4b2f-bea6-492985543ace/call-Bowtie2/execution/</code> | * <code>./cromwell-executions/CromwellBowtie2/04d44744-2a84-4b2f-bea6-492985543ace/call-Bowtie2/execution/</code> | ||
Where <code>cromwell-executions</code> is a subdirectory of the working directory, the <code>04d44744-2a84-4b2f-bea6-492985543ace</code> directory is named at runtime, and <code>execution</code> was the working directory during the execution of the task, ''<code>Bowtie2</code>''. | Where <code>cromwell-executions</code> is a subdirectory of the working directory, the <code>04d44744-2a84-4b2f-bea6-492985543ace</code> directory is named at runtime, and <code>execution</code> was the working directory during the execution of the task, ''<code>Bowtie2</code>''. |
Latest revision as of 10:55, 27 February 2024
Category
Tools
Program On
Sapelo2
Version
56
Author / Distributor
Description
"Cromwell is a Workflow Management System geared towards scientific workflows. Cromwell is open sourced under the BSD 3-Clause license." cromwell.readthedocs.io
Running Program
Please also refer to Running Jobs on Sapelo2.
- Cromwell 56 is installed for use with Java 11.
module load cromwell/56-Java-11
Requirements
To execute Cromwell as a job on Sapelo2, the following are required:
- Cromwell Configuration File (required)
- Defines how each step in the workflow should be initialized.
- WDL File (required)
- Defines the workflow itself.
- Inputs File (optional but recommended)
- Defines the inputs to the workflow.
- Options File (optional)
- Defines any additional options.
- Job Submission Script (required)
Example Requirements
Example Configuration File
Cromwell requires a configuration file that includes instructions for how to execute workflows.
The maintainers of Cromwell provide short and intuitive documentation and tutorials to help understand and write a Cromwell configuration file:
- https://cromwell.readthedocs.io/en/stable/tutorials/ConfigurationFiles/
- https://cromwell.readthedocs.io/en/stable/backends/SLURM/
- https://cromwell.readthedocs.io/en/stable/tutorials/HPCIntro/
Reviewing the content at the links above can help to understand the following Cromwell configuration file that has been adapted for Sapelo2 (based on their SLURM example).
The following file can also be found at /usr/local/training/Cromwell/cromwell-gacrc.conf
:
cromwell-gacrc.conf
backend {
default = slurm
providers {
slurm {
actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
config {
runtime-attributes = """
String partition = "batch"
Int ntasks = 1
Int cpus_per_task = 8
Int memory = 8000
Int time = 10
"""
submit = """
sbatch \
--job-name=${job_name} \
--partition=${partition} \
--ntasks=${ntasks} \
--cpus-per-task=${cpus_per_task} \
--mem=${memory} \
--time=${time} \
--output=${out} \
--error=${err} \
--chdir=${cwd} \
--wrap "/usr/bin/env bash ${script}"
"""
kill = "scancel ${job_id}"
check-alive = "squeue -j ${job_id}"
job-id-regex = "Submitted batch job (\\d+).*"
}
}
}
}
Example WDL (Workflow Description Language) File
Cromwell executes workflows written in WDL (Cromwell Language Support). The Cromwell maintainers provide an example WDL in their documentation.
The following workflow incorporates the same Bowtie2 example covered in the Sapelo2 training workshop, and can be found at /usr/local/training/Cromwell/cromwell-bowtie2.wdl
:
cromwell-bowtie2.wdl
workflow CromwellBowtie2 {
File input_fq
File index_dir
String index_name
Int cpus_per_task
call Bowtie2 {
input:
input_fq = input_fq,
index_dir = index_dir,
index_name = index_name,
cpus_per_task = cpus_per_task,
}
}
task Bowtie2 {
File input_fq
File index_dir
String index_name
Int cpus_per_task
command {
bowtie2 -p ${cpus_per_task} -x ${index_dir}/${index_name} -U ${input_fq} > alignments.output
}
output {
File out = "alignments.output"
}
}
A WDL file contains a task-by-task description of a workflow. The first block is the workflow
block, wherein tasks are called. Each task is described in its own task
block. To an extent, it can be helpful to consider the workflow
block as analogous to the main
function, and the task
blocks as analogous to functions
.
For more thorough information about WDL, refer to their language specification documentation. More WDL examples can be found here.
While the paths to input data can be written in the WDL file directly, it is considered best practice to supply them at runtime instead for re-usability. This is a convenient feature when importing WDL files from other groups, as it removes the need to edit hardcoded values.
Example Workflow Input File
In Cromwell, Workflow Input Files are written in JSON. They are specified with the --inputs
flag when Cromwell is executed at the command line. These files define the requirements of the workflow, such as input files, or other input values. Specifying these input values in a separate file prevents the need to hardcode inputs in the original workflow file.
From the above Example WDL File, the CromwellBowtie2
workflow utilizes the following values:
workflow CromwellBowtie2 {
File input_fq
File index_dir
String index_name
Int cpus_per_task
The following JSON file provides definitions for each of these values, and can be found at /usr/local/training/Cromwell/inputs.json
:
inputs.json
{ "CromwellBowtie2.input_fq": "myreads.fq",
"CromwellBowtie2.index_dir": "index",
"CromwellBowtie2.index_name": "lambda_virus",
"CromwellBowtie2.cpus_per_task": "8"
}
Usage: --inputs input.json
In an input file, variables are referenced using the workflow name followed by a period and the variable name CromwellBowtie2.input_fq
.
The example data referenced in this JSON file can be found at the following locations:
/usr/local/training/Cromwell/index
/usr/local/training/Cromwell/myreads.fq
Example Workflow Options File
In Cromwell, Workflow Options Files, are also written in JSON. They are specified with the --options
flag when Cromwell is executed at the command line. These files describe the options to use during the execution of a workflow.
By default, the output of a workflow step is stored in that step's execution directory.
The following JSON file makes use of Cromwell's Output Copying capabilities to copy the output into a directory named output, and can be found at /usr/local/training/Cromwell/options.json
:
options.json
{ "final_workflow_outputs_dir": "output",
"use_relative_output_paths": true
}
Usage: --options options.json
Without specifying an alternative output directory, the output would be in a location similar to the following:
./cromwell-executions/CromwellBowtie2/04d44744-2a84-4b2f-bea6-492985543ace/call-Bowtie2/execution/
Where cromwell-executions
is a subdirectory of the working directory, the 04d44744-2a84-4b2f-bea6-492985543ace
directory is named at runtime, and execution
was the working directory during the execution of the task, Bowtie2
.
Example Job submission Script
The following is an example job submission script that utilizes the files described above, and can be found at /usr/local/training/Cromwell/cromwell-sub.sh
:
cromwell-sub.sh
#!/bin/bash
#SBATCH --job-name=cromwell-bowtie2
#SBATCH --partition=batch
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=8gb
#SBATCH --time=00:10:00
#SBATCH --output=%x.%j.out
#SBATCH --error=%x.%j.err
module load cromwell/56-Java-11
module load Bowtie2/2.4.5-GCC-11.3.0
cd $SLURM_SUBMIT_DIR
java \
-Xmx8g \
-Dconfig.file=cromwell-gacrc.conf \
-jar $EBROOTCROMWELL/cromwell.jar \
run cromwell-bowtie2.wdl \
--inputs inputs.json \
--options options.json
Where:
-Xmx8g
instructs the Java Virtual Machine to allocate 8g of memory, which is equal to the amount requested in the SLURM header (--mem=8gb
).-Dconfig.file=cromwell-gacrc.conf
is the path to the configuration file.-jar $EBROOTCROMWELL/cromwell.jar
is the Java Archive to run, which in this case is the Cromwell executable.run cromwell-bowtie2.wdl
contains the subcommand,run
, and instructs Cromwell to run the workflow in Command Line mode.--inputs inputs.json
specifies the workflow inputs are defined in theinputs.json
file.--options options.json
specifies any additional workflow options are defined in theoptions.json
file.
Running the example
To run the above example, navigate to scratch and copy the files into the working directory:
mkdir /scratch/$USER/cromwell-example
cd /scratch/$USER/cromwell-example
cp -r /usr/local/training/Cromwell/* ./
Once copied, the job can be submitted with sbatch
:
sbatch cromwell-sub.sh
Installation
- Version 56: Installed using EasyBuild.
System
- 64-bit Linux