Array Jobs: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
(Created page with "==Introduction== An array job is a collection of jobs (called array job "elements") initiated from a single submission script. Array jobs work well for problems that are [htt...")
 
No edit summary
Line 44: Line 44:
</pre>
</pre>


with $SLURM_ARRAY_TASK_ID being replaced by one of the numbers in the range defined in the --array Slurm header.
with ${SLURM_ARRAY_TASK_ID} being replaced by one of the numbers in the range defined in the --array Slurm header.

Revision as of 11:12, 16 June 2021

Introduction

An array job is a collection of jobs (called array job "elements") initiated from a single submission script. Array jobs work well for problems that are embarassingly parallel, meaning a problem can be easily split up into concurrently running tasks that are not dependent on one another. Imagine you have 10 input files that you want to perform the same action(s) against. Rather than looping through the input one at a time, or rather than writing 10 almost identical submission scripts, you could write and submit one array job submission script.

Example Submission Script

Writing an array job submission script is hardly different from any other type Slurm submission script. The two key things to remember are the Slurm array header (#SBATCH --array), and the SLURM_ARRAY_TASK_ID environment variable. Below is an array job submission script in which there are 5 input files to be ran as arguments for myScript.R, assuming the input files were named myinput-1, myinput-2, myinput-3, etc...

#!/bin/bash

#SBATCH --job-name=array-test
#SBATCH --partition=batch
#SBATCH --ntasks=1
#SBATCH --mem=20gb
#SBATCH --time=1:00:00
#SBATCH --array=1-5

ml R/4.0.0-foss-2019b

Rscript myScript.R myinput-${SLURM_ARRAY_TASK_ID}

Submitting the above script would create five array job elements as shown below:

bc06026@b1-24 arraytest$ sbatch sub.sh
Submitted batch job 3341751
bc06026@b1-24 arraytest$ squeue --me
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
         3341751_1     batch array-te  bc06026  R       0:08      1 c4-9
         3341751_2     batch array-te  bc06026  R       0:08      1 c4-21
         3341751_3     batch array-te  bc06026  R       0:08      1 c4-21
         3341751_4     batch array-te  bc06026  R       0:08      1 c4-21
         3341751_5     batch array-te  bc06026  R       0:08      1 c4-11

As you can see in the squeue --me output, by submitting this one submission script, we have 5 jobs running concurrently. Each one of these jobs is allocated the resources requested in the submission script and is running the commands:

ml R/4.0.0-foss-2019b

Rscript myScript.R myinput-${SLURM_ARRAY_TASK_ID}

with ${SLURM_ARRAY_TASK_ID} being replaced by one of the numbers in the range defined in the --array Slurm header.