Migrating from Torque to Slurm
Later this year GACRC will be implementing the Simple Linux Utility for Resource Management (Slurm) software for job scheduling and resource management on Sapelo2, to replace the Torque (PBS) resource manager and Moab scheduling system that it currently uses.
How is Slurm different from Torque?
Slurm is different from Torque in a number of ways, including the commands used to submit and monitor jobs, the syntax used to request resources, and the way environment variables behave.
Some specific ways in which Slurm is different from Torque include:
- Slurm will not allow a job to be submitted whose requested resources exceed the set of resources the job owner has access to--whether or not those resources have been already allocated to other jobs at the moment. Torque will queue the job, but the job would never run.
- What Torque calls queues, Slurm calls partitions.
- Resources in Slurm are assigned per “task”/process.
- In Slurm, environmental variables of the submitting process are passed to the job, except that by default Slurm does not source the files
~/.bashrc
when requesting resources viasbatch
.
How to Submit Jobs
To submit jobs in Slurm, replace qsub
with one of the commands from the table below.
Info | Torque Command | Slurm Command |
---|---|---|
Submit a batch job to the queue | qsub <job script> |
sbatch <job script>
|
Start an interactive job | qsub -I <options> |
salloc <options>
|
where <job script> needs to be replaced by the name of your job submission script (e.g. sub.sh). Note that the syntax in the job submission script for Slurm is different from the Torque syntax.
Job Submission Options
As with Torque, job options and resource requests in Slurm can be set in the job submission script or as options to the job submission command. However, the syntax used to request resources is different and the table below summarizes some of the options that are frequently used.
Option | Torque (qsub) | Slurm (sbatch) |
---|---|---|
Script directive | #PBS |
#SBATCH
|
Job name | -N <name> |
--job-name=<name> -J <name>
|
Queue | -q <queue> |
--partition=<queue>
|
Wall time limit | -l walltime=<hh:mm:ss> |
--time=<hh:mm:ss>
|
Node count | -l nodes=<count> |
--nodes=<count> -N <count>
|
Process count per node | -l ppn=<count> |
--ntasks-per-node=<count>
|
core count (per process) | --cpus-per-task=<cores>
| |
Memory limit | -l mem=<limit> |
--mem=<limit> (Memory per node in mega bytes – MB)
|
Minimum memory per processor | -l pmem=<limit> |
--mem-per-cpu=<memory>
|
Request GPUs | -l gpus=<count> |
--gres=gpu:<count>
|
Request specific nodes | -l nodes=<node>[,node2[,...]]> |
-w, --nodelist=<node>[,node2[,...]]> -F, --nodefile=<node file>
|
Request node feature | -l nodes=<count>:ppn=<count>:<feature> |
--constraint=<feature>
|
Job array | -t <array indices> |
--array <indexes> -a <indexes> Where <indexes> is replaced by a range (0-15), a list (0, 6, 16-32), or a step function (0-15:4) |
Standard output file | -o <file path> |
--output=<file path> (path must exist)
|
Standard error file | -e <file path> |
--error=<file path> (path must exist)
|
Combine stdout/stderr to stdout | -j oe |
--output=<combined out and err file path>
|
Copy environment | -V |
--export=ALL (default) --export=NONE to not export environment
|
Copy environment variable | -v <variable[=value][,variable2=value2[,...]]> |
--export=<variable[=value][,variable2=value2[,...]]>
|
Job dependency | -W depend=after:jobID[:jobID...] -W depend=afterok:jobID[:jobID...] -W depend=afternotok:jobID[:jobID...] -W depend=afterany:jobID[:jobID...] |
--dependency=after:jobID[:jobID...] --dependency=afterok:jobID[:jobID...] --dependency=afternotok:jobID[:jobID...] --dependency=afterany:jobID[:jobID...]
|
Request event notification | -m <events> |
--mail-type=<events> Note: multiple mail-type requests may be specified in a comma separated list: --mail-type=BEGIN,END,NONE,FAIL,REQUEUE
|
Email address | -M <email address> |
--mail-user=<email address>
|
Defer job until the specified time | -a <date/time> |
--begin=<date/time>
|
Node exclusive job | qsub -n |
--exclusive
|
Common Job Commands
How to Submit and Manage Jobs
Info | Torque Command | Slurm Command |
---|---|---|
Submit a job | qsub <job script> |
sbatch <job script>
|
Delete a job | qdel <job ID> |
scancel <job ID>
|
Hold a job | qhold <job ID> |
scontrol hold <job ID>
|
Release a job | qrls <job ID> |
scontrol release <job ID>
|
Start an interactive job | qsub -I <options> |
salloc <options> srun --pty <options>
|
Start an interactive job with X forwarding | qsub -I -X <options> |
srun --x11 <options>
|
How to View Resources on the Cluster
Info | Torque Command | Slurm Command |
---|---|---|
Queue list / info | qstat -q [queue] |
scontrol show partition [queue]
|
Node list | pbsnodes -a mdiag -n -v |
scontrol show nodes
|
Node details | pbsnodes <node> |
scontrol show node <node>
|
Cluster status | qstat -B |
sinfo
|
How to Monitor Jobs
Info | Torque Command | Slurm Command |
---|---|---|
Job status (all) | qstat showq |
squeue
|
Job status (by job) | qstat <job ID> |
squeue -j <job ID>
|
Job status (by user) | qstat -u <user> |
squeue -u <user>
|
Job status (only own jobs) | qstat_me |
squeue --me squeue --me -l
|
Job status (detailed) | qstat -f <job ID> checkjob <job ID> |
scontrol show job -dd <job ID>
|
Show expected start time | showstart <job ID> |
squeue -j <job ID> --start
|
Monitor or review a job’s resource usage | qstat -f <job ID> |
sacct -j <job ID> --format JobID,jobname,NTasks,nodelist,CPUTime,ReqMem,Elapsed
|
View job batch script | scontrol write batch_script <job ID> [filename]
|
Valid Job States
Below are the job states you may encounter when monitoring your job(s) in Slurm.
Code | State | Meaning |
---|---|---|
CA | Canceled | Job was canceled |
CD | Completed | Job completed |
CF | Configuring | Job resources being configured |
CG | Completing | Job is completing |
F | Failed | Job terminated with non-zero exit code |
NF | Node Fail | Job terminated due to failure of node(s) |
PD | Pending | Job is waiting for compute node(s) |
R | Running | Job is running on compute node(s) |
Job Environment and Environment Variables
In Slurm, environment variables will get passed to your job by default.
If you have certain environment variables set that you think might interfere with your job you can either:
- Log out then log back in and submit your job
- Run sbatch with one of these options to override the default behavior:
sbatch --export=None sbatch --export MYPARAM=3 sbatch --export=ALL,MYPARAM=3
NOTE: We recommend that you submit sbatch Slurm jobs with the #SBATCH --export=NONE option to establish a clean environment, otherwise Slurm will propagate current environmental variables to the job. This could impact the behavior of the job, particularly for MPI jobs.
Like Torque, Slurm sets its own environment variables within your job. The table below summarizes some environment variables that are frequently used.
Info | Torque | Slurm | Notes |
---|---|---|---|
Version | $PBS_VERSION |
– | Can extract from sbatch --version
|
Job name | $PBS_JOBNAME |
$SLURM_JOB_NAME |
|
Job ID | $PBS_JOBID |
$SLURM_JOB_ID |
|
Batch or interactive | $PBS_ENVIRONMENT |
– | |
Submit directory | $PBS_O_WORKDIR |
$SLURM_SUBMIT_DIR |
Slurm jobs start from the submit directory by default. |
Submit host | $PBS_O_HOST |
$SLURM_SUBMIT_HOST |
|
Node file | $PBS_NODEFILE |
A filename and path that lists the nodes a job has been allocated. | |
Node list | cat $PBS_NODEFILE |
$SLURM_JOB_NODELIST |
The Slurm variable has a different format to the Torque/PBS one. To get a list of nodes use: scontrol show hostnames $SLURM_JOB_NODELIST
|
Job array index | $PBS_ARRAYID $PBS_ARRAY_INDEX |
$SLURM_ARRAY_TASK_ID |
Only set when submitting a job array (with -a or –array) |
Walltime | $PBS_WALLTIME |
– | |
Queue name | $PBS_QUEUE |
$SLURM_JOB_PARTITION |
|
Number of nodes allocated | $PBS_NUM_NODES |
$SLURM_JOB_NUM_NODES $SLURM_NNODES |
|
Number of processes | $PBS_NP |
$SLURM_NTASKS |
|
Number of processes per node | $PBS_NUM_PPN |
$SLURM_TASKS_PER_NODE |
|
List of allocated GPUs | $PBS_GPUFILE |
– | |
Requested tasks per node | – | $SLURM_NTASKS_PER_NODE |
|
Requested CPUs per task | – | $SLURM_CPUS_PER_TASK |
|
Scheduling priority | – | $SLURM_PRIO_PROCESS |
|
Job user | – | $SLURM_JOB_USER |
Slurm Documentation
Extensive documentation on Slurm is available at https://slurm.schedmd.com/documentation.html
This page was adapted from https://arc-ts.umich.edu/migrating-from-torque-to-slurm/ and https://hpcc.usc.edu/support/documentation/pbs-to-slurm/