Cromwell-Sapelo2

From Research Computing Center Wiki
Revision as of 18:07, 26 February 2024 by Isaiah (talk | contribs) (Added example WDL, inputs.json, and options.json sections and files)
Jump to navigation Jump to search

Category

Tools

Program On

Sapelo2

Version

56

Author / Distributor

Broad Institute

Description

"Cromwell is a Workflow Management System geared towards scientific workflows. Cromwell is open sourced under the BSD 3-Clause license." cromwell.readthedocs.io

Running Program

Versions

Please also refer to Running Jobs on Sapelo2.

  • Cromwell 56 is installed for use with Java 11.

Example Configuration File

Cromwell requires a configuration file that includes instructions for how to execute workflows.

The maintainers of Cromwell provide short and intuitive documentation and tutorials to help understand and write a Cromwell configuration file:

Reviewing the content at the links above can help to understand the following Cromwell configuration file that has been adapted for Sapelo2 (based on their SLURM example).

The following file can also be found at /usr/local/training/Cromwell/cromwell-gacrc.conf:

cromwell-gacrc.conf
backend {
  default = slurm

  providers {
    slurm {
      actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
      config {
        runtime-attributes = """
        String partition = "batch"
        Int ntasks = 1
        Int cpus_per_task = 8
        Int memory = 8000
        Int time = 10
        """
        submit = """
            sbatch \
                --job-name=${job_name} \
                --partition=${partition} \
                --ntasks=${ntasks} \
                --cpus-per-task=${cpus_per_task} \
                --mem=${memory} \
                --time=${time} \
                --output=${out} \
                --error=${err} \
                --chdir=${cwd} \
                --wrap "/usr/bin/env bash ${script}"
        """
        kill = "scancel ${job_id}"
        check-alive = "squeue -j ${job_id}"
        job-id-regex = "Submitted batch job (\\d+).*"
      }
    }
  }
}

Example WDL (Workflow Description Language) File

Cromwell executes workflows written in WDL (Cromwell Language Support). The Cromwell maintainers provide an example WDL in their documentation.

The following workflow incorporates the same Bowtie2 example covered in the Sapelo2 training workshop, and can be found at /usr/local/training/Cromwell/cromwell-bowtie2.wdl:

cromwell-bowtie2.wdl
workflow CromwellBowtie2 {
    File input_fq
    File index_dir
    String index_name
    Int cpus_per_task

    call Bowtie2 {
        input:
            input_fq = input_fq,
            index_dir = index_dir,
            index_name = index_name,
            cpus_per_task = cpus_per_task,
    }
}

task Bowtie2 {
    File input_fq
    File index_dir
    String index_name
    Int cpus_per_task

    command {
        bowtie2 -p ${cpus_per_task} -x ${index_dir}/${index_name} -U ${input_fq} > alignments.output
    }
    output {
        File out = "alignments.output"
    }
}

Workflow Input File

In Cromwell, Workflow Input Files are written in JSON. They are specified with the --inputs flag when Cromwell is executed at the command line. These files define the requirements of the workflow, such as input files, or other input values. Specifying these input values in a separate file prevents the need to hardcode inputs in the original workflow file.

Continuing with the above Example WDL File, the CromwellBowtie2 workflow utilizes the following values:

    File input_fq
    File index_dir
    String index_name
    Int threads

The following JSON file provides definitions for each of these values, and can be found at /usr/local/training/Cromwell/inputs.json:

inputs.json
{   "CromwellBowtie2.input_fq": "myreads.fq",
    "CromwellBowtie2.index_dir": "index",
    "CromwellBowtie2.index_name": "lambda_virus",
    "CromwellBowtie2.cpus_per_task": "8"
}

The example data referenced in this JSON file can be found at the following locations:

  • /usr/local/training/Cromwell/index
  • /usr/local/training/Cromwell/myreads.fq

Workflow Options File

In Cromwell, Workflow Options Files, are also written in JSON. They are specified with the --options flag when Cromwell is executed at the command line. These files describe the options to use during the execution of a workflow.

By default, the output of a workflow step is stored in that step's execution directory.

The following JSON file makes use of Cromwell's Output Copying capabilities to copy the output into a directory named output, and can be found at /usr/local/training/Cromwell/options.json:

options.json
{   "final_workflow_outputs_dir": "output",
    "use_relative_output_paths": true
}