Migrating from Torque to Slurm: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
Line 5: Line 5:


===How is Slurm different from Torque?===
===How is Slurm different from Torque?===
===How to Submit Jobs?===
To submit jobs in Slurm, replace 'qsub' with one of the commands from the table below.
{|  width="100%" border="1"  cellspacing="0" cellpadding="2" align="top" class="wikitable unsortable"
|-
! scope="col" | Info
! scope="col" | Torque Command
! scope="col" | Slurm Command
|-
|-
| Submit a batch job to the queue || qsub job.pbs || sbatch job.slurm
|-
| Start an interactive job || qsub -I || salloc <options>
|}
where job.pbs is the name of the job submission script you used for Torque and job.slurm is the name of the job submission script for Slurm. These files are often called sub.sh in our documentation. Different names are used here to emphasize that the syntax of the job submission script for Slurm is different from the Torque syntax.


===Job Submission Options===
===Job Submission Options===
Line 11: Line 29:




{|  width="100%" border="1"  cellspacing="0" cellpadding="2" align="top" class="wikitable unsortable"
{|  width="60%" border="1"  cellspacing="0" cellpadding="2" align="top" class="wikitable unsortable"
|-
|-
! scope="col" | Option
! scope="col" | Option

Revision as of 11:15, 11 February 2020

Later this year GACRC will be implementing the Simple Linux Utility for Resource Management (Slurm) software for job scheduling and resource management on Sapelo2, to replace the Torque (PBS) resource manager and Moab scheduling system that it currently uses.

How is Slurm different from Torque?

How to Submit Jobs?

To submit jobs in Slurm, replace 'qsub' with one of the commands from the table below.

Info Torque Command Slurm Command
Submit a batch job to the queue qsub job.pbs sbatch job.slurm
Start an interactive job qsub -I salloc <options>

where job.pbs is the name of the job submission script you used for Torque and job.slurm is the name of the job submission script for Slurm. These files are often called sub.sh in our documentation. Different names are used here to emphasize that the syntax of the job submission script for Slurm is different from the Torque syntax.

Job Submission Options

As with Torque, job options and resource requests in Slurm can be set in the job submission script or as options to the job submission command. However, the syntax used to request resources is different and the table below summarizes some of the options that are frequently used.


Option Torque (qsub) Slurm (sbatch)
Script directive #PBS #SBATCH
Job name -N <name> --job-name=<name>
-J <name>
Queue -q <queue> --partition=<queue>
Wall time limit -l walltime=<hh:mm:ss> --time=<hh:mm:ss>
Node count -l nodes=<count> --nodes=<count>
-N <count>
Process count per node -l ppn=<count> --ntasks-per-node=<count>
core count (per process) --cpus-per-task=<cores>
Memory limit -l mem=<limit> --mem=<limit> (Memory per node in mega bytes – MB)
Minimum memory per processor -l pmem=<limit> --mem-per-cpu=<memory>
Request GPUs -l gpus=<count> --gres=gpu:<count>
Request specific nodes -l nodes=<node>[,node2[,...]]> -w, --nodelist=<node>[,node2[,...]]>
-F, --nodefile=<node file>
Job array -t <array indices> --array <indexes>
-a <indexes>
Where <indexes> is replaced by a range (0-15), a list (0, 6, 16-32), or a step function (0-15:4)
Standard output file -o <file path> --output=<file path> (path must exist)
Standard error file -e <file path> --error=<file path> (path must exist)
Combine stdout/stderr to stdout -j oe --output=<combined out and err file path>
Copy environment -V --export=ALL (default)
--export=NONE to not export environment
Copy environment variable -v <variable[=value][,variable2=value2[,...]]> --export=<variable[=value][,variable2=value2[,...]]>
Job dependency -W depend=after:jobID[:jobID...]
-W depend=afterok:jobID[:jobID...]
-W depend=afternotok:jobID[:jobID...]
-W depend=afterany:jobID[:jobID...]
--dependency=after:jobID[:jobID...]
--dependency=afterok:jobID[:jobID...]
--dependency=afternotok:jobID[:jobID...]
--dependency=afterany:jobID[:jobID...]
Request event notification -m <events> --mail-type=<events>
Note: multiple mail-type requests may be specified in a comma separated list:
--mail-type=BEGIN,END,NONE,FAIL,REQUEUE
Email address -M <email address> --mail-user=<email address>
Defer job until the specified time -a <date/time> --begin=<date/time>
Node exclusive job qsub -n --exclusive

Job Environment and Environment Variables

In Slurm, environment variables will get passed to your job by default. If you have certain environment variables set that you think might interfere with your job you can either:

  • Log out then log back in and submit your job
  • Run sbatch with these options to override the default behavior:
     sbatch --export=None 
     sbatch --export MYPARAM=3 
     sbatch --export=ALL,MYPARAM=3

Like Torque, Slurm sets its own environment variables within your job. The table below summarizes some environment variables that are frequently used.

Info Torque Slurm Notes
Version $PBS_VERSION Can extract from sbatch --version
Job name $PBS_JOBNAME $SLURM_JOB_NAME
Job ID $PBS_JOBID $SLURM_JOB_ID
Batch or interactive $PBS_ENVIRONMENT
Submit directory $PBS_O_WORKDIR $SLURM_SUBMIT_DIR Slurm jobs starts from the submit directory by default.
Submit host $PBS_O_HOST $SLURM_SUBMIT_HOST
Node file $PBS_NODEFILE A filename and path that lists the nodes a job has been allocated.
Node list cat $PBS_NODEFILE $SLURM_JOB_NODELIST The Slurm variable has a different format to the Torque/PBS one. To get a list of nodes use:
scontrol show hostnames $SLURM_JOB_NODELIST
Job array index $PBS_ARRAYID
$PBS_ARRAY_INDEX
$SLURM_ARRAY_TASK_ID Only set when submitting a job array (with -a or –array)
Walltime $PBS_WALLTIME
Queue name $PBS_QUEUE $SLURM_JOB_PARTITION
Number of nodes allocated $PBS_NUM_NODES $SLURM_JOB_NUM_NODES
$SLURM_NNODES
Number of processes $PBS_NP $SLURM_NTASKS
Number of processes per node $PBS_NUM_PPN $SLURM_TASKS_PER_NODE
List of allocated GPUs $PBS_GPUFILE
Requested tasks per node $SLURM_NTASKS_PER_NODE
Requested CPUs per task $SLURM_CPUS_PER_TASK
Scheduling priority $SLURM_PRIO_PROCESS
Job user $SLURM_JOB_USER
Hostname $HOSTNAME $HOSTNAME == $SLURM_SUBMIT_HOST Unless a shell is invoked on an allocated resource, the HOSTNAME variable is propagated (copied) from the submit machine environments will be the same on all allocated nodes.

Common Job Commands

How to Submit and Manage Jobs

How to Monitor Jobs

Valid Job States

How to View Resources on the Cluster