Cromwell-Sapelo2: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
m (Added example config file)
(Added example WDL, inputs.json, and options.json sections and files)
Line 30: Line 30:
*Cromwell 56 is installed for use with Java 11.
*Cromwell 56 is installed for use with Java 11.


'''Example Configuration File'''
==== '''Example Configuration File''' ====
 
Cromwell requires a configuration file that includes instructions for how to execute workflows.
Cromwell requires a configuration file that includes instructions for how to execute workflows.


Line 42: Line 41:
Reviewing the content at the links above can help to understand the following Cromwell configuration file that has been adapted for Sapelo2 (based on [https://github.com/broadinstitute/cromwell/blob/e914bec7fe65350de1389191d051b845462c1f81/cromwell.example.backends/slurm.conf their SLURM example]).  
Reviewing the content at the links above can help to understand the following Cromwell configuration file that has been adapted for Sapelo2 (based on [https://github.com/broadinstitute/cromwell/blob/e914bec7fe65350de1389191d051b845462c1f81/cromwell.example.backends/slurm.conf their SLURM example]).  


The following file can also be found at <code>/usr/local/training/Cromwell/cromwell-gacrc.conf</code>:<syntaxhighlight lang="json">
The following file can also be found at <code>/usr/local/training/Cromwell/cromwell-gacrc.conf</code>:
 
===== <code>cromwell-gacrc.conf</code> =====
<syntaxhighlight lang="json">
backend {
backend {
   default = slurm
   default = slurm
Line 76: Line 78:
     }
     }
   }
   }
}
</syntaxhighlight>
==== Example WDL (Workflow Description Language) File ====
Cromwell executes workflows written in ''WDL'' ([https://cromwell.readthedocs.io/en/stable/LanguageSupport/ Cromwell Language Support]). The Cromwell maintainers provide an [https://cromwell.readthedocs.io/en/stable/tutorials/FiveMinuteIntro/ example ''WDL''] in their documentation.
The following workflow incorporates the same ''Bowtie2'' example covered in the [[Training#Sapelo2 Cluster New User Training|Sapelo2 training workshop]], and can be found at <code>/usr/local/training/Cromwell/cromwell-bowtie2.wdl</code>:
===== <code>cromwell-bowtie2.wdl</code> =====
<syntaxhighlight>
workflow CromwellBowtie2 {
    File input_fq
    File index_dir
    String index_name
    Int cpus_per_task
    call Bowtie2 {
        input:
            input_fq = input_fq,
            index_dir = index_dir,
            index_name = index_name,
            cpus_per_task = cpus_per_task,
    }
}
task Bowtie2 {
    File input_fq
    File index_dir
    String index_name
    Int cpus_per_task
    command {
        bowtie2 -p ${cpus_per_task} -x ${index_dir}/${index_name} -U ${input_fq} > alignments.output
    }
    output {
        File out = "alignments.output"
    }
}
</syntaxhighlight>
==== Workflow Input File ====
In Cromwell, ''Workflow Input Files'' are written in JSON. They are specified with the <code>''--inputs''</code> flag when Cromwell is executed at the [https://cromwell.readthedocs.io/en/stable/CommandLine/#run command line]. These files define the requirements of the workflow, such as input files, or other input values. Specifying these input values in a separate file prevents the need to hardcode inputs in the original workflow file.
Continuing with the above Example WDL File, the <code>CromwellBowtie2</code> workflow utilizes the following values:<syntaxhighlight>
    File input_fq
    File index_dir
    String index_name
    Int threads
</syntaxhighlight>The following JSON file provides definitions for each of these values, and can be found at <code>/usr/local/training/Cromwell/inputs.json</code>:
===== <code>inputs.json</code> =====
<syntaxhighlight lang="json">
{  "CromwellBowtie2.input_fq": "myreads.fq",
    "CromwellBowtie2.index_dir": "index",
    "CromwellBowtie2.index_name": "lambda_virus",
    "CromwellBowtie2.cpus_per_task": "8"
}
</syntaxhighlight>The example data referenced in this JSON file can be found at the following locations:
* <code>/usr/local/training/Cromwell/index</code>
* <code>/usr/local/training/Cromwell/myreads.fq</code>
==== Workflow Options File ====
In Cromwell, ''Workflow Options Files'', are also written in JSON. They are specified with the ''<code>--options</code>'' flag when Cromwell is executed at the [https://cromwell.readthedocs.io/en/stable/CommandLine/#run command line]. These files describe the options to use during the execution of a workflow.
By default, the output of a workflow step is stored in that step's execution directory.
The following JSON file makes use of Cromwell's [https://cromwell.readthedocs.io/en/stable/wf_options/Overview/#output-copying Output Copying] capabilities to copy the output into a directory named output, and can be found at <code>/usr/local/training/Cromwell/options.json</code>:
===== <code>options.json</code> =====
<syntaxhighlight lang="json">
{  "final_workflow_outputs_dir": "output",
    "use_relative_output_paths": true
}
}
</syntaxhighlight>
</syntaxhighlight>

Revision as of 18:07, 26 February 2024

Category

Tools

Program On

Sapelo2

Version

56

Author / Distributor

Broad Institute

Description

"Cromwell is a Workflow Management System geared towards scientific workflows. Cromwell is open sourced under the BSD 3-Clause license." cromwell.readthedocs.io

Running Program

Versions

Please also refer to Running Jobs on Sapelo2.

  • Cromwell 56 is installed for use with Java 11.

Example Configuration File

Cromwell requires a configuration file that includes instructions for how to execute workflows.

The maintainers of Cromwell provide short and intuitive documentation and tutorials to help understand and write a Cromwell configuration file:

Reviewing the content at the links above can help to understand the following Cromwell configuration file that has been adapted for Sapelo2 (based on their SLURM example).

The following file can also be found at /usr/local/training/Cromwell/cromwell-gacrc.conf:

cromwell-gacrc.conf
backend {
  default = slurm

  providers {
    slurm {
      actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
      config {
        runtime-attributes = """
        String partition = "batch"
        Int ntasks = 1
        Int cpus_per_task = 8
        Int memory = 8000
        Int time = 10
        """
        submit = """
            sbatch \
                --job-name=${job_name} \
                --partition=${partition} \
                --ntasks=${ntasks} \
                --cpus-per-task=${cpus_per_task} \
                --mem=${memory} \
                --time=${time} \
                --output=${out} \
                --error=${err} \
                --chdir=${cwd} \
                --wrap "/usr/bin/env bash ${script}"
        """
        kill = "scancel ${job_id}"
        check-alive = "squeue -j ${job_id}"
        job-id-regex = "Submitted batch job (\\d+).*"
      }
    }
  }
}

Example WDL (Workflow Description Language) File

Cromwell executes workflows written in WDL (Cromwell Language Support). The Cromwell maintainers provide an example WDL in their documentation.

The following workflow incorporates the same Bowtie2 example covered in the Sapelo2 training workshop, and can be found at /usr/local/training/Cromwell/cromwell-bowtie2.wdl:

cromwell-bowtie2.wdl
workflow CromwellBowtie2 {
    File input_fq
    File index_dir
    String index_name
    Int cpus_per_task

    call Bowtie2 {
        input:
            input_fq = input_fq,
            index_dir = index_dir,
            index_name = index_name,
            cpus_per_task = cpus_per_task,
    }
}

task Bowtie2 {
    File input_fq
    File index_dir
    String index_name
    Int cpus_per_task

    command {
        bowtie2 -p ${cpus_per_task} -x ${index_dir}/${index_name} -U ${input_fq} > alignments.output
    }
    output {
        File out = "alignments.output"
    }
}

Workflow Input File

In Cromwell, Workflow Input Files are written in JSON. They are specified with the --inputs flag when Cromwell is executed at the command line. These files define the requirements of the workflow, such as input files, or other input values. Specifying these input values in a separate file prevents the need to hardcode inputs in the original workflow file.

Continuing with the above Example WDL File, the CromwellBowtie2 workflow utilizes the following values:

    File input_fq
    File index_dir
    String index_name
    Int threads

The following JSON file provides definitions for each of these values, and can be found at /usr/local/training/Cromwell/inputs.json:

inputs.json
{   "CromwellBowtie2.input_fq": "myreads.fq",
    "CromwellBowtie2.index_dir": "index",
    "CromwellBowtie2.index_name": "lambda_virus",
    "CromwellBowtie2.cpus_per_task": "8"
}

The example data referenced in this JSON file can be found at the following locations:

  • /usr/local/training/Cromwell/index
  • /usr/local/training/Cromwell/myreads.fq

Workflow Options File

In Cromwell, Workflow Options Files, are also written in JSON. They are specified with the --options flag when Cromwell is executed at the command line. These files describe the options to use during the execution of a workflow.

By default, the output of a workflow step is stored in that step's execution directory.

The following JSON file makes use of Cromwell's Output Copying capabilities to copy the output into a directory named output, and can be found at /usr/local/training/Cromwell/options.json:

options.json
{   "final_workflow_outputs_dir": "output",
    "use_relative_output_paths": true
}