Cromwell-Sapelo2: Difference between revisions
(Added example WDL, inputs.json, and options.json sections and files) |
(Formatting) |
||
(8 intermediate revisions by the same user not shown) | |||
Line 22: | Line 22: | ||
"Cromwell is a Workflow Management System geared towards scientific workflows. Cromwell is open sourced under the [https://github.com/broadinstitute/cromwell/blob/develop/LICENSE.txt BSD 3-Clause license]." [https://cromwell.readthedocs.io/en/stable/ cromwell.readthedocs.io] | "Cromwell is a Workflow Management System geared towards scientific workflows. Cromwell is open sourced under the [https://github.com/broadinstitute/cromwell/blob/develop/LICENSE.txt BSD 3-Clause license]." [https://cromwell.readthedocs.io/en/stable/ cromwell.readthedocs.io] | ||
===Running Program=== | === Running Program === | ||
Please also refer to [[Running Jobs on Sapelo2]]. | |||
*Cromwell 56 is installed for use with Java 11. | |||
**<code>module load cromwell/56-Java-11</code> | |||
==== Requirements ==== | |||
To execute Cromwell as a job on Sapelo2, the following are required: | |||
# '''<u>[[Cromwell-Sapelo2#Example Configuration File|Cromwell Configuration File]]</u>''' (required) | |||
## Defines how each step in the workflow should be initialized. | |||
# '''<u>[[Cromwell-Sapelo2#Example WDL .28Workflow Description Language.29 File|WDL File]]</u>''' (required) | |||
## Defines the workflow itself. | |||
# '''<u>[[Cromwell-Sapelo2#Example Workflow Input File|Inputs File]]</u>''' (optional but recommended) | |||
## Defines the inputs to the workflow. | |||
# '''<u>[[Cromwell-Sapelo2#Example Workflow Options File|Options File]]</u>''' (optional) | |||
## Defines any additional options. | |||
# '''<u>[[Cromwell-Sapelo2#Example Job submission Script|Job Submission Script]]</u>''' (required) | |||
==== Example Requirements ==== | |||
==== '''Example Configuration File''' ==== | ====='''Example Configuration File'''===== | ||
Cromwell requires a configuration file that includes instructions for how to execute workflows. | Cromwell requires a configuration file that includes instructions for how to execute workflows. | ||
Line 43: | Line 55: | ||
The following file can also be found at <code>/usr/local/training/Cromwell/cromwell-gacrc.conf</code>: | The following file can also be found at <code>/usr/local/training/Cromwell/cromwell-gacrc.conf</code>: | ||
===== <code>cromwell-gacrc.conf</code> ===== | ======<code>cromwell-gacrc.conf</code>====== | ||
<syntaxhighlight lang=" | <syntaxhighlight lang="java"> | ||
backend { | backend { | ||
default = slurm | default = slurm | ||
Line 81: | Line 93: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
==== Example WDL (Workflow Description Language) File ==== | ===== '''Example WDL (Workflow Description Language) File''' ===== | ||
Cromwell executes workflows written in ''WDL'' ([https://cromwell.readthedocs.io/en/stable/LanguageSupport/ Cromwell Language Support]). The Cromwell maintainers provide an [https://cromwell.readthedocs.io/en/stable/tutorials/FiveMinuteIntro/ example ''WDL''] in their documentation. | Cromwell executes workflows written in ''WDL'' ([https://cromwell.readthedocs.io/en/stable/LanguageSupport/ Cromwell Language Support]). The Cromwell maintainers provide an [https://cromwell.readthedocs.io/en/stable/tutorials/FiveMinuteIntro/ example ''WDL''] in their documentation. | ||
The following workflow incorporates the same ''Bowtie2'' example covered in the [[Training#Sapelo2 Cluster New User Training|Sapelo2 training workshop]], and can be found at <code>/usr/local/training/Cromwell/cromwell-bowtie2.wdl</code>: | The following workflow incorporates the same ''Bowtie2'' example covered in the [[Training#Sapelo2 Cluster New User Training|Sapelo2 training workshop]], and can be found at <code>/usr/local/training/Cromwell/cromwell-bowtie2.wdl</code>: | ||
===== <code>cromwell-bowtie2.wdl</code> ===== | ======<code>cromwell-bowtie2.wdl</code>====== | ||
<syntaxhighlight> | <syntaxhighlight> | ||
workflow CromwellBowtie2 { | workflow CromwellBowtie2 { | ||
Line 117: | Line 129: | ||
} | } | ||
</syntaxhighlight> | </syntaxhighlight> | ||
A WDL file contains a task-by-task description of a workflow. The first block is the ''<code>workflow</code>'' block, wherein tasks are called. Each task is described in its own ''<code>task</code>'' block. To an extent, it can be helpful to consider the ''<code>workflow</code>'' block as analogous to the ''<code>main</code>'' function, and the ''<code>task</code>'' blocks as analogous to ''<code>functions</code>''. | |||
For more thorough information about WDL, refer to their [https://github.com/openwdl/wdl/blob/main/versions/development/SPEC.md language specification documentation]. More WDL examples can be found [https://github.com/openwdl/learn-wdl/ here]. | |||
While the paths to input data can be written in the WDL file directly, it is considered best practice to supply them at runtime instead for re-usability. This is a convenient feature when importing WDL files from other groups, as it removes the need to edit hardcoded values. | |||
==== Workflow Input File ==== | ===== '''Example Workflow Input File''' ===== | ||
In Cromwell, ''Workflow Input Files'' are written in JSON. They are specified with the <code>''--inputs''</code> flag when Cromwell is executed at the [https://cromwell.readthedocs.io/en/stable/CommandLine/#run command line]. These files define the requirements of the workflow, such as input files, or other input values. Specifying these input values in a separate file prevents the need to hardcode inputs in the original workflow file. | In Cromwell, ''Workflow Input Files'' are written in JSON. They are specified with the <code>''--inputs''</code> flag when Cromwell is executed at the [https://cromwell.readthedocs.io/en/stable/CommandLine/#run command line]. These files define the requirements of the workflow, such as input files, or other input values. Specifying these input values in a separate file prevents the need to hardcode inputs in the original workflow file. | ||
From the above [[Cromwell-Sapelo2#Example WDL .28Workflow Description Language.29 File|Example WDL File]], the <code>CromwellBowtie2</code> workflow utilizes the following values:<syntaxhighlight> | |||
workflow CromwellBowtie2 { | |||
File input_fq | File input_fq | ||
File index_dir | File index_dir | ||
String index_name | String index_name | ||
Int | Int cpus_per_task | ||
</syntaxhighlight>The following JSON file provides definitions for each of these values, and can be found at <code>/usr/local/training/Cromwell/inputs.json</code>: | </syntaxhighlight>The following JSON file provides definitions for each of these values, and can be found at <code>/usr/local/training/Cromwell/inputs.json</code>: | ||
===== <code>inputs.json</code> ===== | ======<code>inputs.json</code>====== | ||
<syntaxhighlight lang="json"> | <syntaxhighlight lang="json"> | ||
{ "CromwellBowtie2.input_fq": "myreads.fq", | { "CromwellBowtie2.input_fq": "myreads.fq", | ||
Line 135: | Line 153: | ||
"CromwellBowtie2.cpus_per_task": "8" | "CromwellBowtie2.cpus_per_task": "8" | ||
} | } | ||
</syntaxhighlight>The example data referenced in this JSON file can be found at the following locations: | </syntaxhighlight>'''<u>Usage:</u>''' <code>--inputs input.json</code> | ||
In an input file, variables are referenced using the '''workflow name''' followed by a '''period''' and the '''variable name''' <code>CromwellBowtie2.input_fq</code>. | |||
The example data referenced in this JSON file can be found at the following locations: | |||
* <code>/usr/local/training/Cromwell/index</code> | * <code>/usr/local/training/Cromwell/index</code> | ||
* <code>/usr/local/training/Cromwell/myreads.fq</code> | * <code>/usr/local/training/Cromwell/myreads.fq</code> | ||
==== Workflow Options File ==== | ===== '''Example Workflow Options File''' ===== | ||
In Cromwell, ''Workflow Options Files'', are also written in JSON. They are specified with the ''<code>--options</code>'' flag when Cromwell is executed at the [https://cromwell.readthedocs.io/en/stable/CommandLine/#run command line]. These files describe the options to use during the execution of a workflow. | In Cromwell, ''Workflow Options Files'', are also written in JSON. They are specified with the ''<code>--options</code>'' flag when Cromwell is executed at the [https://cromwell.readthedocs.io/en/stable/CommandLine/#run command line]. These files describe the options to use during the execution of a workflow. | ||
Line 147: | Line 169: | ||
The following JSON file makes use of Cromwell's [https://cromwell.readthedocs.io/en/stable/wf_options/Overview/#output-copying Output Copying] capabilities to copy the output into a directory named output, and can be found at <code>/usr/local/training/Cromwell/options.json</code>: | The following JSON file makes use of Cromwell's [https://cromwell.readthedocs.io/en/stable/wf_options/Overview/#output-copying Output Copying] capabilities to copy the output into a directory named output, and can be found at <code>/usr/local/training/Cromwell/options.json</code>: | ||
===== <code>options.json</code> ===== | ======<code>options.json</code>====== | ||
<syntaxhighlight lang="json"> | <syntaxhighlight lang="json"> | ||
{ "final_workflow_outputs_dir": "output", | { "final_workflow_outputs_dir": "output", | ||
"use_relative_output_paths": true | "use_relative_output_paths": true | ||
} | } | ||
</syntaxhighlight>'''<u>Usage:</u>''' <code>--options options.json</code> | |||
Without specifying an alternative output directory, the output would be in a location similar to the following: | |||
* <code>./cromwell-executions/CromwellBowtie2/04d44744-2a84-4b2f-bea6-492985543ace/call-Bowtie2/execution/</code> | |||
Where <code>cromwell-executions</code> is a subdirectory of the working directory, the <code>04d44744-2a84-4b2f-bea6-492985543ace</code> directory is named at runtime, and <code>execution</code> was the working directory during the execution of the task, ''<code>Bowtie2</code>''. | |||
===== '''Example Job submission Script''' ===== | |||
The following is an example job submission script that utilizes the files described above, and can be found at <code>/usr/local/training/Cromwell/cromwell-sub.sh</code>: | |||
====== <code>cromwell-sub.sh</code> ====== | |||
<syntaxhighlight lang="bash"> | |||
#!/bin/bash | |||
#SBATCH --job-name=cromwell-bowtie2 | |||
#SBATCH --partition=batch | |||
#SBATCH --ntasks=1 | |||
#SBATCH --cpus-per-task=8 | |||
#SBATCH --mem=8gb | |||
#SBATCH --time=00:10:00 | |||
#SBATCH --output=%x.%j.out | |||
#SBATCH --error=%x.%j.err | |||
module load cromwell/56-Java-11 | |||
module load Bowtie2/2.4.5-GCC-11.3.0 | |||
cd $SLURM_SUBMIT_DIR | |||
java \ | |||
-Xmx8g \ | |||
-Dconfig.file=cromwell-gacrc.conf \ | |||
-jar $EBROOTCROMWELL/cromwell.jar \ | |||
run cromwell-bowtie2.wdl \ | |||
--inputs inputs.json \ | |||
--options options.json | |||
</syntaxhighlight>Where: | |||
* <code>-Xmx8g</code> instructs the Java Virtual Machine to allocate 8g of memory, which is equal to the amount requested in the SLURM header (<code>--mem=8gb</code>). | |||
* <code>-Dconfig.file=cromwell-gacrc.conf</code> is the path to the configuration file. | |||
* <code>-jar $EBROOTCROMWELL/cromwell.jar</code> is the Java Archive to run, which in this case is the Cromwell executable. | |||
* <code>run cromwell-bowtie2.wdl</code> contains the subcommand, <code>run</code>, and instructs Cromwell to run the workflow in [https://cromwell.readthedocs.io/en/stable/CommandLine/#run Command Line] mode. | |||
* <code>--inputs inputs.json</code> specifies the workflow inputs are defined in the <code>inputs.json</code> file. | |||
* <code>--options options.json</code> specifies any additional workflow options are defined in the <code>options.json</code> file. | |||
==== Running the example ==== | |||
To run the above example, navigate to scratch and copy the files into the working directory:<syntaxhighlight lang="bash"> | |||
mkdir /scratch/$USER/cromwell-example | |||
cd /scratch/$USER/cromwell-example | |||
cp -r /usr/local/training/Cromwell/* ./ | |||
</syntaxhighlight>Once copied, the job can be submitted with <code>sbatch</code>:<syntaxhighlight lang="bash"> | |||
sbatch cromwell-sub.sh | |||
</syntaxhighlight> | </syntaxhighlight> | ||
=== Installation === | |||
* Version 56: Installed using EasyBuild. | |||
=== System === | |||
* 64-bit Linux |
Latest revision as of 10:55, 27 February 2024
Category
Tools
Program On
Sapelo2
Version
56
Author / Distributor
Description
"Cromwell is a Workflow Management System geared towards scientific workflows. Cromwell is open sourced under the BSD 3-Clause license." cromwell.readthedocs.io
Running Program
Please also refer to Running Jobs on Sapelo2.
- Cromwell 56 is installed for use with Java 11.
module load cromwell/56-Java-11
Requirements
To execute Cromwell as a job on Sapelo2, the following are required:
- Cromwell Configuration File (required)
- Defines how each step in the workflow should be initialized.
- WDL File (required)
- Defines the workflow itself.
- Inputs File (optional but recommended)
- Defines the inputs to the workflow.
- Options File (optional)
- Defines any additional options.
- Job Submission Script (required)
Example Requirements
Example Configuration File
Cromwell requires a configuration file that includes instructions for how to execute workflows.
The maintainers of Cromwell provide short and intuitive documentation and tutorials to help understand and write a Cromwell configuration file:
- https://cromwell.readthedocs.io/en/stable/tutorials/ConfigurationFiles/
- https://cromwell.readthedocs.io/en/stable/backends/SLURM/
- https://cromwell.readthedocs.io/en/stable/tutorials/HPCIntro/
Reviewing the content at the links above can help to understand the following Cromwell configuration file that has been adapted for Sapelo2 (based on their SLURM example).
The following file can also be found at /usr/local/training/Cromwell/cromwell-gacrc.conf
:
cromwell-gacrc.conf
backend {
default = slurm
providers {
slurm {
actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
config {
runtime-attributes = """
String partition = "batch"
Int ntasks = 1
Int cpus_per_task = 8
Int memory = 8000
Int time = 10
"""
submit = """
sbatch \
--job-name=${job_name} \
--partition=${partition} \
--ntasks=${ntasks} \
--cpus-per-task=${cpus_per_task} \
--mem=${memory} \
--time=${time} \
--output=${out} \
--error=${err} \
--chdir=${cwd} \
--wrap "/usr/bin/env bash ${script}"
"""
kill = "scancel ${job_id}"
check-alive = "squeue -j ${job_id}"
job-id-regex = "Submitted batch job (\\d+).*"
}
}
}
}
Example WDL (Workflow Description Language) File
Cromwell executes workflows written in WDL (Cromwell Language Support). The Cromwell maintainers provide an example WDL in their documentation.
The following workflow incorporates the same Bowtie2 example covered in the Sapelo2 training workshop, and can be found at /usr/local/training/Cromwell/cromwell-bowtie2.wdl
:
cromwell-bowtie2.wdl
workflow CromwellBowtie2 {
File input_fq
File index_dir
String index_name
Int cpus_per_task
call Bowtie2 {
input:
input_fq = input_fq,
index_dir = index_dir,
index_name = index_name,
cpus_per_task = cpus_per_task,
}
}
task Bowtie2 {
File input_fq
File index_dir
String index_name
Int cpus_per_task
command {
bowtie2 -p ${cpus_per_task} -x ${index_dir}/${index_name} -U ${input_fq} > alignments.output
}
output {
File out = "alignments.output"
}
}
A WDL file contains a task-by-task description of a workflow. The first block is the workflow
block, wherein tasks are called. Each task is described in its own task
block. To an extent, it can be helpful to consider the workflow
block as analogous to the main
function, and the task
blocks as analogous to functions
.
For more thorough information about WDL, refer to their language specification documentation. More WDL examples can be found here.
While the paths to input data can be written in the WDL file directly, it is considered best practice to supply them at runtime instead for re-usability. This is a convenient feature when importing WDL files from other groups, as it removes the need to edit hardcoded values.
Example Workflow Input File
In Cromwell, Workflow Input Files are written in JSON. They are specified with the --inputs
flag when Cromwell is executed at the command line. These files define the requirements of the workflow, such as input files, or other input values. Specifying these input values in a separate file prevents the need to hardcode inputs in the original workflow file.
From the above Example WDL File, the CromwellBowtie2
workflow utilizes the following values:
workflow CromwellBowtie2 {
File input_fq
File index_dir
String index_name
Int cpus_per_task
The following JSON file provides definitions for each of these values, and can be found at /usr/local/training/Cromwell/inputs.json
:
inputs.json
{ "CromwellBowtie2.input_fq": "myreads.fq",
"CromwellBowtie2.index_dir": "index",
"CromwellBowtie2.index_name": "lambda_virus",
"CromwellBowtie2.cpus_per_task": "8"
}
Usage: --inputs input.json
In an input file, variables are referenced using the workflow name followed by a period and the variable name CromwellBowtie2.input_fq
.
The example data referenced in this JSON file can be found at the following locations:
/usr/local/training/Cromwell/index
/usr/local/training/Cromwell/myreads.fq
Example Workflow Options File
In Cromwell, Workflow Options Files, are also written in JSON. They are specified with the --options
flag when Cromwell is executed at the command line. These files describe the options to use during the execution of a workflow.
By default, the output of a workflow step is stored in that step's execution directory.
The following JSON file makes use of Cromwell's Output Copying capabilities to copy the output into a directory named output, and can be found at /usr/local/training/Cromwell/options.json
:
options.json
{ "final_workflow_outputs_dir": "output",
"use_relative_output_paths": true
}
Usage: --options options.json
Without specifying an alternative output directory, the output would be in a location similar to the following:
./cromwell-executions/CromwellBowtie2/04d44744-2a84-4b2f-bea6-492985543ace/call-Bowtie2/execution/
Where cromwell-executions
is a subdirectory of the working directory, the 04d44744-2a84-4b2f-bea6-492985543ace
directory is named at runtime, and execution
was the working directory during the execution of the task, Bowtie2
.
Example Job submission Script
The following is an example job submission script that utilizes the files described above, and can be found at /usr/local/training/Cromwell/cromwell-sub.sh
:
cromwell-sub.sh
#!/bin/bash
#SBATCH --job-name=cromwell-bowtie2
#SBATCH --partition=batch
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=8gb
#SBATCH --time=00:10:00
#SBATCH --output=%x.%j.out
#SBATCH --error=%x.%j.err
module load cromwell/56-Java-11
module load Bowtie2/2.4.5-GCC-11.3.0
cd $SLURM_SUBMIT_DIR
java \
-Xmx8g \
-Dconfig.file=cromwell-gacrc.conf \
-jar $EBROOTCROMWELL/cromwell.jar \
run cromwell-bowtie2.wdl \
--inputs inputs.json \
--options options.json
Where:
-Xmx8g
instructs the Java Virtual Machine to allocate 8g of memory, which is equal to the amount requested in the SLURM header (--mem=8gb
).-Dconfig.file=cromwell-gacrc.conf
is the path to the configuration file.-jar $EBROOTCROMWELL/cromwell.jar
is the Java Archive to run, which in this case is the Cromwell executable.run cromwell-bowtie2.wdl
contains the subcommand,run
, and instructs Cromwell to run the workflow in Command Line mode.--inputs inputs.json
specifies the workflow inputs are defined in theinputs.json
file.--options options.json
specifies any additional workflow options are defined in theoptions.json
file.
Running the example
To run the above example, navigate to scratch and copy the files into the working directory:
mkdir /scratch/$USER/cromwell-example
cd /scratch/$USER/cromwell-example
cp -r /usr/local/training/Cromwell/* ./
Once copied, the job can be submitted with sbatch
:
sbatch cromwell-sub.sh
Installation
- Version 56: Installed using EasyBuild.
System
- 64-bit Linux