Cromwell-Sapelo2: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
(Added example WDL, inputs.json, and options.json sections and files)
(Finished draft 1.0)
Line 22: Line 22:
"Cromwell is a Workflow Management System geared towards scientific workflows. Cromwell is open sourced under the [https://github.com/broadinstitute/cromwell/blob/develop/LICENSE.txt BSD 3-Clause license]." [https://cromwell.readthedocs.io/en/stable/ cromwell.readthedocs.io]
"Cromwell is a Workflow Management System geared towards scientific workflows. Cromwell is open sourced under the [https://github.com/broadinstitute/cromwell/blob/develop/LICENSE.txt BSD 3-Clause license]." [https://cromwell.readthedocs.io/en/stable/ cromwell.readthedocs.io]


===Running Program===
=== Running Program ===
Please also refer to [[Running Jobs on Sapelo2]].
*Cromwell 56 is installed for use with Java 11.
**<code>module load cromwell/56-Java-11</code>
 
==== Requirements ====
To execute Cromwell as a job on Sapelo2, the following are required:


====Versions====
# '''<u>Cromwell Configuration File</u>''' (required)
## Defines how each step in the workflow should be initialized.
# '''<u>WDL File</u>''' (required)
## Defines the workflow itself.
# '''<u>Inputs File</u>''' (optional but recommended)
## Defines the inputs to the workflow.
# '''<u>Options File</u>''' (optional)
## Defines any additional options.
# '''<u>Job Submission Script</u>''' (required)


Please also refer to [[Running Jobs on Sapelo2]].
==== Example Requirements ====
*Cromwell 56 is installed for use with Java 11.


==== '''Example Configuration File''' ====
====='''Example Configuration File'''=====
Cromwell requires a configuration file that includes instructions for how to execute workflows.
Cromwell requires a configuration file that includes instructions for how to execute workflows.


Line 43: Line 55:
The following file can also be found at <code>/usr/local/training/Cromwell/cromwell-gacrc.conf</code>:
The following file can also be found at <code>/usr/local/training/Cromwell/cromwell-gacrc.conf</code>:


===== <code>cromwell-gacrc.conf</code> =====
======<code>cromwell-gacrc.conf</code>======
<syntaxhighlight lang="json">
<syntaxhighlight lang="java">
backend {
backend {
   default = slurm
   default = slurm
Line 81: Line 93:
</syntaxhighlight>
</syntaxhighlight>


==== Example WDL (Workflow Description Language) File ====
===== '''Example WDL (Workflow Description Language) File''' =====
Cromwell executes workflows written in ''WDL'' ([https://cromwell.readthedocs.io/en/stable/LanguageSupport/ Cromwell Language Support]). The Cromwell maintainers provide an [https://cromwell.readthedocs.io/en/stable/tutorials/FiveMinuteIntro/ example ''WDL''] in their documentation.
Cromwell executes workflows written in ''WDL'' ([https://cromwell.readthedocs.io/en/stable/LanguageSupport/ Cromwell Language Support]). The Cromwell maintainers provide an [https://cromwell.readthedocs.io/en/stable/tutorials/FiveMinuteIntro/ example ''WDL''] in their documentation.


The following workflow incorporates the same ''Bowtie2'' example covered in the [[Training#Sapelo2 Cluster New User Training|Sapelo2 training workshop]], and can be found at <code>/usr/local/training/Cromwell/cromwell-bowtie2.wdl</code>:
The following workflow incorporates the same ''Bowtie2'' example covered in the [[Training#Sapelo2 Cluster New User Training|Sapelo2 training workshop]], and can be found at <code>/usr/local/training/Cromwell/cromwell-bowtie2.wdl</code>:


===== <code>cromwell-bowtie2.wdl</code> =====
======<code>cromwell-bowtie2.wdl</code>======
<syntaxhighlight>
<syntaxhighlight>
workflow CromwellBowtie2 {
workflow CromwellBowtie2 {
Line 117: Line 129:
}
}
</syntaxhighlight>
</syntaxhighlight>
A WDL file contains a task-by-task description of a workflow. The first block is the ''<code>workflow</code>'' block, wherein tasks are called. Each task is described in its own ''<code>task</code>'' block. To an extent, it can be helpful to consider the ''<code>workflow</code>'' block as analogous to the ''<code>main</code>'' function, and the ''<code>task</code>'' blocks as analogous to ''<code>functions</code>''.
For more thorough information about WDL, refer to their [https://github.com/openwdl/wdl/blob/main/versions/development/SPEC.md language specification documentation]. More WDL examples can be found [https://github.com/openwdl/learn-wdl/ here].


==== Workflow Input File ====
While the paths to input data can be written in the WDL file directly, it is considered best practice to supply them at runtime instead for re-usability. This is a convenient feature when importing WDL files from other groups, as it removes the need to edit hardcoded values.
 
===== '''Example Workflow Input File''' =====
In Cromwell, ''Workflow Input Files'' are written in JSON. They are specified with the <code>''--inputs''</code> flag when Cromwell is executed at the [https://cromwell.readthedocs.io/en/stable/CommandLine/#run command line]. These files define the requirements of the workflow, such as input files, or other input values. Specifying these input values in a separate file prevents the need to hardcode inputs in the original workflow file.
In Cromwell, ''Workflow Input Files'' are written in JSON. They are specified with the <code>''--inputs''</code> flag when Cromwell is executed at the [https://cromwell.readthedocs.io/en/stable/CommandLine/#run command line]. These files define the requirements of the workflow, such as input files, or other input values. Specifying these input values in a separate file prevents the need to hardcode inputs in the original workflow file.


Line 128: Line 145:
</syntaxhighlight>The following JSON file provides definitions for each of these values, and can be found at <code>/usr/local/training/Cromwell/inputs.json</code>:
</syntaxhighlight>The following JSON file provides definitions for each of these values, and can be found at <code>/usr/local/training/Cromwell/inputs.json</code>:


===== <code>inputs.json</code> =====
======<code>inputs.json</code>======
<syntaxhighlight lang="json">
<syntaxhighlight lang="json">
{  "CromwellBowtie2.input_fq": "myreads.fq",
{  "CromwellBowtie2.input_fq": "myreads.fq",
Line 135: Line 152:
     "CromwellBowtie2.cpus_per_task": "8"
     "CromwellBowtie2.cpus_per_task": "8"
}
}
</syntaxhighlight>The example data referenced in this JSON file can be found at the following locations:
</syntaxhighlight><code>--inputs input.json</code>
 
The example data referenced in this JSON file can be found at the following locations:


* <code>/usr/local/training/Cromwell/index</code>
* <code>/usr/local/training/Cromwell/index</code>
* <code>/usr/local/training/Cromwell/myreads.fq</code>
* <code>/usr/local/training/Cromwell/myreads.fq</code>


==== Workflow Options File ====
===== '''Example Workflow Options File''' =====
In Cromwell, ''Workflow Options Files'', are also written in JSON. They are specified with the ''<code>--options</code>'' flag when Cromwell is executed at the [https://cromwell.readthedocs.io/en/stable/CommandLine/#run command line]. These files describe the options to use during the execution of a workflow.
In Cromwell, ''Workflow Options Files'', are also written in JSON. They are specified with the ''<code>--options</code>'' flag when Cromwell is executed at the [https://cromwell.readthedocs.io/en/stable/CommandLine/#run command line]. These files describe the options to use during the execution of a workflow.


Line 147: Line 166:
The following JSON file makes use of Cromwell's [https://cromwell.readthedocs.io/en/stable/wf_options/Overview/#output-copying Output Copying] capabilities to copy the output into a directory named output, and can be found at <code>/usr/local/training/Cromwell/options.json</code>:
The following JSON file makes use of Cromwell's [https://cromwell.readthedocs.io/en/stable/wf_options/Overview/#output-copying Output Copying] capabilities to copy the output into a directory named output, and can be found at <code>/usr/local/training/Cromwell/options.json</code>:


===== <code>options.json</code> =====
======<code>options.json</code>======
<syntaxhighlight lang="json">
<syntaxhighlight lang="json">
{  "final_workflow_outputs_dir": "output",
{  "final_workflow_outputs_dir": "output",
     "use_relative_output_paths": true
     "use_relative_output_paths": true
}
}
</syntaxhighlight><code>--options options.json</code>
Without specifying an alternative output directory, the output would be in a location similar to the following:
<code>./cromwell-executions/CromwellBowtie2/04d44744-2a84-4b2f-bea6-492985543ace/call-Bowtie2/execution/</code>
Where <code>cromwell-executions</code> is a subdirectory of the working directory, the <code>04d44744-2a84-4b2f-bea6-492985543ace</code> directory is named at runtime, and <code>execution</code> was the working directory of the task, ''<code>Bowtie2</code>''.
'''Example Job submission Script'''
The following is an example job submission script that utilizes the files described above, and can be found at <code>/usr/local/training/Cromwell/cromwell-sub.sh</code>:<syntaxhighlight lang="bash">
#!/bin/bash
#SBATCH --job-name=cromwell-bowtie2
#SBATCH --partition=batch
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=8gb
#SBATCH --time=00:10:00
#SBATCH --output=%x.%j.out
#SBATCH --error=%x.%j.err
module load cromwell/56-Java-11
module load Bowtie2/2.4.5-GCC-11.3.0
cd $SLURM_SUBMIT_DIR
java \
-Xmx8g \
-Dconfig.file=cromwell-gacrc.conf \
-jar $EBROOTCROMWELL/cromwell.jar \
run cromwell-bowtie2.wdl \
--inputs inputs.json \
--options options.json
</syntaxhighlight>Where:
* <code>-Xmx8g</code> instructs the Java Virtual Machine to allocate 8g of memory, which is equal to the amount requested in the SLURM header (<code>--mem=8gb</code>).
* <code>-Dconfig.file=cromwell-gacrc.conf</code> is the path to the configuration file.
* <code>-jar $EBROOTCROMWELL/cromwell.jar</code> is the Java Archive to run, which in this case is the Cromwell executable.
* <code>run cromwell-bowtie2.wdl</code> contains the subcommand, <code>run</code>, and instructs Cromwell to run the workflow in [https://cromwell.readthedocs.io/en/stable/CommandLine/#run Command Line] mode.
* <code>--inputs inputs.json</code> specifies the workflow inputs are defined in the <code>inputs.json</code> file.
* <code>--options options.json</code> specifies any additional workflow options are defined in the <code>options.json</code> file.
==== Running the example ====
To run the above example, navigate to scratch and copy the files into the working directory:<syntaxhighlight lang="bash">
cd /scratch/$USER
cp -r /usr/local/training/Cromwell/* ./
</syntaxhighlight>Once copied, the job can be submitted with <code>sbatch</code>:<syntaxhighlight lang="bash">
sbatch cromwell-sub.sh
</syntaxhighlight>
</syntaxhighlight>
=== Installation ===
* Version 56: Installed using EasyBuild.
=== System ===
* 64-bit Linux

Revision as of 10:03, 27 February 2024

Category

Tools

Program On

Sapelo2

Version

56

Author / Distributor

Broad Institute

Description

"Cromwell is a Workflow Management System geared towards scientific workflows. Cromwell is open sourced under the BSD 3-Clause license." cromwell.readthedocs.io

Running Program

Please also refer to Running Jobs on Sapelo2.

  • Cromwell 56 is installed for use with Java 11.
    • module load cromwell/56-Java-11

Requirements

To execute Cromwell as a job on Sapelo2, the following are required:

  1. Cromwell Configuration File (required)
    1. Defines how each step in the workflow should be initialized.
  2. WDL File (required)
    1. Defines the workflow itself.
  3. Inputs File (optional but recommended)
    1. Defines the inputs to the workflow.
  4. Options File (optional)
    1. Defines any additional options.
  5. Job Submission Script (required)

Example Requirements

Example Configuration File

Cromwell requires a configuration file that includes instructions for how to execute workflows.

The maintainers of Cromwell provide short and intuitive documentation and tutorials to help understand and write a Cromwell configuration file:

Reviewing the content at the links above can help to understand the following Cromwell configuration file that has been adapted for Sapelo2 (based on their SLURM example).

The following file can also be found at /usr/local/training/Cromwell/cromwell-gacrc.conf:

cromwell-gacrc.conf
backend {
  default = slurm

  providers {
    slurm {
      actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
      config {
        runtime-attributes = """
        String partition = "batch"
        Int ntasks = 1
        Int cpus_per_task = 8
        Int memory = 8000
        Int time = 10
        """
        submit = """
            sbatch \
                --job-name=${job_name} \
                --partition=${partition} \
                --ntasks=${ntasks} \
                --cpus-per-task=${cpus_per_task} \
                --mem=${memory} \
                --time=${time} \
                --output=${out} \
                --error=${err} \
                --chdir=${cwd} \
                --wrap "/usr/bin/env bash ${script}"
        """
        kill = "scancel ${job_id}"
        check-alive = "squeue -j ${job_id}"
        job-id-regex = "Submitted batch job (\\d+).*"
      }
    }
  }
}
Example WDL (Workflow Description Language) File

Cromwell executes workflows written in WDL (Cromwell Language Support). The Cromwell maintainers provide an example WDL in their documentation.

The following workflow incorporates the same Bowtie2 example covered in the Sapelo2 training workshop, and can be found at /usr/local/training/Cromwell/cromwell-bowtie2.wdl:

cromwell-bowtie2.wdl
workflow CromwellBowtie2 {
    File input_fq
    File index_dir
    String index_name
    Int cpus_per_task

    call Bowtie2 {
        input:
            input_fq = input_fq,
            index_dir = index_dir,
            index_name = index_name,
            cpus_per_task = cpus_per_task,
    }
}

task Bowtie2 {
    File input_fq
    File index_dir
    String index_name
    Int cpus_per_task

    command {
        bowtie2 -p ${cpus_per_task} -x ${index_dir}/${index_name} -U ${input_fq} > alignments.output
    }
    output {
        File out = "alignments.output"
    }
}

A WDL file contains a task-by-task description of a workflow. The first block is the workflow block, wherein tasks are called. Each task is described in its own task block. To an extent, it can be helpful to consider the workflow block as analogous to the main function, and the task blocks as analogous to functions.

For more thorough information about WDL, refer to their language specification documentation. More WDL examples can be found here.

While the paths to input data can be written in the WDL file directly, it is considered best practice to supply them at runtime instead for re-usability. This is a convenient feature when importing WDL files from other groups, as it removes the need to edit hardcoded values.

Example Workflow Input File

In Cromwell, Workflow Input Files are written in JSON. They are specified with the --inputs flag when Cromwell is executed at the command line. These files define the requirements of the workflow, such as input files, or other input values. Specifying these input values in a separate file prevents the need to hardcode inputs in the original workflow file.

Continuing with the above Example WDL File, the CromwellBowtie2 workflow utilizes the following values:

    File input_fq
    File index_dir
    String index_name
    Int threads

The following JSON file provides definitions for each of these values, and can be found at /usr/local/training/Cromwell/inputs.json:

inputs.json
{   "CromwellBowtie2.input_fq": "myreads.fq",
    "CromwellBowtie2.index_dir": "index",
    "CromwellBowtie2.index_name": "lambda_virus",
    "CromwellBowtie2.cpus_per_task": "8"
}

--inputs input.json

The example data referenced in this JSON file can be found at the following locations:

  • /usr/local/training/Cromwell/index
  • /usr/local/training/Cromwell/myreads.fq
Example Workflow Options File

In Cromwell, Workflow Options Files, are also written in JSON. They are specified with the --options flag when Cromwell is executed at the command line. These files describe the options to use during the execution of a workflow.

By default, the output of a workflow step is stored in that step's execution directory.

The following JSON file makes use of Cromwell's Output Copying capabilities to copy the output into a directory named output, and can be found at /usr/local/training/Cromwell/options.json:

options.json
{   "final_workflow_outputs_dir": "output",
    "use_relative_output_paths": true
}

--options options.json

Without specifying an alternative output directory, the output would be in a location similar to the following:

./cromwell-executions/CromwellBowtie2/04d44744-2a84-4b2f-bea6-492985543ace/call-Bowtie2/execution/

Where cromwell-executions is a subdirectory of the working directory, the 04d44744-2a84-4b2f-bea6-492985543ace directory is named at runtime, and execution was the working directory of the task, Bowtie2.

Example Job submission Script

The following is an example job submission script that utilizes the files described above, and can be found at /usr/local/training/Cromwell/cromwell-sub.sh:

#!/bin/bash
#SBATCH --job-name=cromwell-bowtie2
#SBATCH --partition=batch
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=8gb
#SBATCH --time=00:10:00
#SBATCH --output=%x.%j.out
#SBATCH --error=%x.%j.err

module load cromwell/56-Java-11
module load Bowtie2/2.4.5-GCC-11.3.0

cd $SLURM_SUBMIT_DIR

java \
	-Xmx8g \
	-Dconfig.file=cromwell-gacrc.conf \
	-jar $EBROOTCROMWELL/cromwell.jar \
	run cromwell-bowtie2.wdl \
	--inputs inputs.json \
	--options options.json

Where:

  • -Xmx8g instructs the Java Virtual Machine to allocate 8g of memory, which is equal to the amount requested in the SLURM header (--mem=8gb).
  • -Dconfig.file=cromwell-gacrc.conf is the path to the configuration file.
  • -jar $EBROOTCROMWELL/cromwell.jar is the Java Archive to run, which in this case is the Cromwell executable.
  • run cromwell-bowtie2.wdl contains the subcommand, run, and instructs Cromwell to run the workflow in Command Line mode.
  • --inputs inputs.json specifies the workflow inputs are defined in the inputs.json file.
  • --options options.json specifies any additional workflow options are defined in the options.json file.

Running the example

To run the above example, navigate to scratch and copy the files into the working directory:

cd /scratch/$USER
cp -r /usr/local/training/Cromwell/* ./

Once copied, the job can be submitted with sbatch:

sbatch cromwell-sub.sh

Installation

  • Version 56: Installed using EasyBuild.

System

  • 64-bit Linux