Cromwell-Sapelo2: Difference between revisions
m (Added example config file) |
(Added example WDL, inputs.json, and options.json sections and files) |
||
Line 30: | Line 30: | ||
*Cromwell 56 is installed for use with Java 11. | *Cromwell 56 is installed for use with Java 11. | ||
'''Example Configuration File''' | ==== '''Example Configuration File''' ==== | ||
Cromwell requires a configuration file that includes instructions for how to execute workflows. | Cromwell requires a configuration file that includes instructions for how to execute workflows. | ||
Line 42: | Line 41: | ||
Reviewing the content at the links above can help to understand the following Cromwell configuration file that has been adapted for Sapelo2 (based on [https://github.com/broadinstitute/cromwell/blob/e914bec7fe65350de1389191d051b845462c1f81/cromwell.example.backends/slurm.conf their SLURM example]). | Reviewing the content at the links above can help to understand the following Cromwell configuration file that has been adapted for Sapelo2 (based on [https://github.com/broadinstitute/cromwell/blob/e914bec7fe65350de1389191d051b845462c1f81/cromwell.example.backends/slurm.conf their SLURM example]). | ||
The following file can also be found at <code>/usr/local/training/Cromwell/cromwell-gacrc.conf</code>:<syntaxhighlight lang="json"> | The following file can also be found at <code>/usr/local/training/Cromwell/cromwell-gacrc.conf</code>: | ||
===== <code>cromwell-gacrc.conf</code> ===== | |||
<syntaxhighlight lang="json"> | |||
backend { | backend { | ||
default = slurm | default = slurm | ||
Line 76: | Line 78: | ||
} | } | ||
} | } | ||
} | |||
</syntaxhighlight> | |||
==== Example WDL (Workflow Description Language) File ==== | |||
Cromwell executes workflows written in ''WDL'' ([https://cromwell.readthedocs.io/en/stable/LanguageSupport/ Cromwell Language Support]). The Cromwell maintainers provide an [https://cromwell.readthedocs.io/en/stable/tutorials/FiveMinuteIntro/ example ''WDL''] in their documentation. | |||
The following workflow incorporates the same ''Bowtie2'' example covered in the [[Training#Sapelo2 Cluster New User Training|Sapelo2 training workshop]], and can be found at <code>/usr/local/training/Cromwell/cromwell-bowtie2.wdl</code>: | |||
===== <code>cromwell-bowtie2.wdl</code> ===== | |||
<syntaxhighlight> | |||
workflow CromwellBowtie2 { | |||
File input_fq | |||
File index_dir | |||
String index_name | |||
Int cpus_per_task | |||
call Bowtie2 { | |||
input: | |||
input_fq = input_fq, | |||
index_dir = index_dir, | |||
index_name = index_name, | |||
cpus_per_task = cpus_per_task, | |||
} | |||
} | |||
task Bowtie2 { | |||
File input_fq | |||
File index_dir | |||
String index_name | |||
Int cpus_per_task | |||
command { | |||
bowtie2 -p ${cpus_per_task} -x ${index_dir}/${index_name} -U ${input_fq} > alignments.output | |||
} | |||
output { | |||
File out = "alignments.output" | |||
} | |||
} | |||
</syntaxhighlight> | |||
==== Workflow Input File ==== | |||
In Cromwell, ''Workflow Input Files'' are written in JSON. They are specified with the <code>''--inputs''</code> flag when Cromwell is executed at the [https://cromwell.readthedocs.io/en/stable/CommandLine/#run command line]. These files define the requirements of the workflow, such as input files, or other input values. Specifying these input values in a separate file prevents the need to hardcode inputs in the original workflow file. | |||
Continuing with the above Example WDL File, the <code>CromwellBowtie2</code> workflow utilizes the following values:<syntaxhighlight> | |||
File input_fq | |||
File index_dir | |||
String index_name | |||
Int threads | |||
</syntaxhighlight>The following JSON file provides definitions for each of these values, and can be found at <code>/usr/local/training/Cromwell/inputs.json</code>: | |||
===== <code>inputs.json</code> ===== | |||
<syntaxhighlight lang="json"> | |||
{ "CromwellBowtie2.input_fq": "myreads.fq", | |||
"CromwellBowtie2.index_dir": "index", | |||
"CromwellBowtie2.index_name": "lambda_virus", | |||
"CromwellBowtie2.cpus_per_task": "8" | |||
} | |||
</syntaxhighlight>The example data referenced in this JSON file can be found at the following locations: | |||
* <code>/usr/local/training/Cromwell/index</code> | |||
* <code>/usr/local/training/Cromwell/myreads.fq</code> | |||
==== Workflow Options File ==== | |||
In Cromwell, ''Workflow Options Files'', are also written in JSON. They are specified with the ''<code>--options</code>'' flag when Cromwell is executed at the [https://cromwell.readthedocs.io/en/stable/CommandLine/#run command line]. These files describe the options to use during the execution of a workflow. | |||
By default, the output of a workflow step is stored in that step's execution directory. | |||
The following JSON file makes use of Cromwell's [https://cromwell.readthedocs.io/en/stable/wf_options/Overview/#output-copying Output Copying] capabilities to copy the output into a directory named output, and can be found at <code>/usr/local/training/Cromwell/options.json</code>: | |||
===== <code>options.json</code> ===== | |||
<syntaxhighlight lang="json"> | |||
{ "final_workflow_outputs_dir": "output", | |||
"use_relative_output_paths": true | |||
} | } | ||
</syntaxhighlight> | </syntaxhighlight> |
Revision as of 17:07, 26 February 2024
Category
Tools
Program On
Sapelo2
Version
56
Author / Distributor
Description
"Cromwell is a Workflow Management System geared towards scientific workflows. Cromwell is open sourced under the BSD 3-Clause license." cromwell.readthedocs.io
Running Program
Versions
Please also refer to Running Jobs on Sapelo2.
- Cromwell 56 is installed for use with Java 11.
Example Configuration File
Cromwell requires a configuration file that includes instructions for how to execute workflows.
The maintainers of Cromwell provide short and intuitive documentation and tutorials to help understand and write a Cromwell configuration file:
- https://cromwell.readthedocs.io/en/stable/tutorials/ConfigurationFiles/
- https://cromwell.readthedocs.io/en/stable/backends/SLURM/
- https://cromwell.readthedocs.io/en/stable/tutorials/HPCIntro/
Reviewing the content at the links above can help to understand the following Cromwell configuration file that has been adapted for Sapelo2 (based on their SLURM example).
The following file can also be found at /usr/local/training/Cromwell/cromwell-gacrc.conf
:
cromwell-gacrc.conf
backend {
default = slurm
providers {
slurm {
actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
config {
runtime-attributes = """
String partition = "batch"
Int ntasks = 1
Int cpus_per_task = 8
Int memory = 8000
Int time = 10
"""
submit = """
sbatch \
--job-name=${job_name} \
--partition=${partition} \
--ntasks=${ntasks} \
--cpus-per-task=${cpus_per_task} \
--mem=${memory} \
--time=${time} \
--output=${out} \
--error=${err} \
--chdir=${cwd} \
--wrap "/usr/bin/env bash ${script}"
"""
kill = "scancel ${job_id}"
check-alive = "squeue -j ${job_id}"
job-id-regex = "Submitted batch job (\\d+).*"
}
}
}
}
Example WDL (Workflow Description Language) File
Cromwell executes workflows written in WDL (Cromwell Language Support). The Cromwell maintainers provide an example WDL in their documentation.
The following workflow incorporates the same Bowtie2 example covered in the Sapelo2 training workshop, and can be found at /usr/local/training/Cromwell/cromwell-bowtie2.wdl
:
cromwell-bowtie2.wdl
workflow CromwellBowtie2 {
File input_fq
File index_dir
String index_name
Int cpus_per_task
call Bowtie2 {
input:
input_fq = input_fq,
index_dir = index_dir,
index_name = index_name,
cpus_per_task = cpus_per_task,
}
}
task Bowtie2 {
File input_fq
File index_dir
String index_name
Int cpus_per_task
command {
bowtie2 -p ${cpus_per_task} -x ${index_dir}/${index_name} -U ${input_fq} > alignments.output
}
output {
File out = "alignments.output"
}
}
Workflow Input File
In Cromwell, Workflow Input Files are written in JSON. They are specified with the --inputs
flag when Cromwell is executed at the command line. These files define the requirements of the workflow, such as input files, or other input values. Specifying these input values in a separate file prevents the need to hardcode inputs in the original workflow file.
Continuing with the above Example WDL File, the CromwellBowtie2
workflow utilizes the following values:
File input_fq
File index_dir
String index_name
Int threads
The following JSON file provides definitions for each of these values, and can be found at /usr/local/training/Cromwell/inputs.json
:
inputs.json
{ "CromwellBowtie2.input_fq": "myreads.fq",
"CromwellBowtie2.index_dir": "index",
"CromwellBowtie2.index_name": "lambda_virus",
"CromwellBowtie2.cpus_per_task": "8"
}
The example data referenced in this JSON file can be found at the following locations:
/usr/local/training/Cromwell/index
/usr/local/training/Cromwell/myreads.fq
Workflow Options File
In Cromwell, Workflow Options Files, are also written in JSON. They are specified with the --options
flag when Cromwell is executed at the command line. These files describe the options to use during the execution of a workflow.
By default, the output of a workflow step is stored in that step's execution directory.
The following JSON file makes use of Cromwell's Output Copying capabilities to copy the output into a directory named output, and can be found at /usr/local/training/Cromwell/options.json
:
options.json
{ "final_workflow_outputs_dir": "output",
"use_relative_output_paths": true
}