Migrating from Torque to Slurm: Difference between revisions
No edit summary |
|||
Line 87: | Line 87: | ||
|- | |- | ||
| Version || <code>$PBS_VERSION</code> || – || Can extract from sbatch --version | | Version || <code>$PBS_VERSION</code> || – || <small>Can extract from</small> <code>sbatch --version</code> | ||
|- | |- | ||
| Job name || <code>$PBS_JOBNAME</code> || <code>$SLURM_JOB_NAME</code> || | | Job name || <code>$PBS_JOBNAME</code> || <code>$SLURM_JOB_NAME</code> || | ||
Line 95: | Line 95: | ||
| Batch or interactive || <code>$PBS_ENVIRONMENT</code> || – || | | Batch or interactive || <code>$PBS_ENVIRONMENT</code> || – || | ||
|- | |- | ||
| Submit directory || <code>$PBS_O_WORKDIR</code> || <code>$SLURM_SUBMIT_DIR</code> || Slurm jobs starts from the submit directory by default. | | Submit directory || <code>$PBS_O_WORKDIR</code> || <code>$SLURM_SUBMIT_DIR</code> || <small>Slurm jobs starts from the submit directory by default.</small> | ||
|- | |- | ||
| Submit host || <code>$PBS_O_HOST</code> || <code>$SLURM_SUBMIT_HOST</code> || | | Submit host || <code>$PBS_O_HOST</code> || <code>$SLURM_SUBMIT_HOST</code> || | ||
|- | |- | ||
| Node file || <code>$PBS_NODEFILE</code> || || A filename and path that lists the nodes a job has been allocated | | Node file || <code>$PBS_NODEFILE</code> || || <small>A filename and path that lists the nodes a job has been allocated.</small> | ||
|- | |- | ||
| Node list || <code>cat $PBS_NODEFILE</code> || <code>$SLURM_JOB_NODELIST</code> || The Slurm variable has a different format to the Torque/PBS one. To get a list of nodes use: <br> <code>scontrol show hostnames $SLURM_JOB_NODELIST</code> | | Node list || <code>cat $PBS_NODEFILE</code> || <code>$SLURM_JOB_NODELIST</code> || <small>The Slurm variable has a different format to the Torque/PBS one. To get a list of nodes use: </small><br> <code>scontrol show hostnames $SLURM_JOB_NODELIST</code> | ||
|- | |- | ||
| Job array index || <code>$PBS_ARRAYID</code> <br> <code>$PBS_ARRAY_INDEX</code> || <code>$SLURM_ARRAY_TASK_ID</code> || Only set when submitting a job array (with -a or –array) | | Job array index || <code>$PBS_ARRAYID</code> <br> <code>$PBS_ARRAY_INDEX</code> || <code>$SLURM_ARRAY_TASK_ID</code> || <small>Only set when submitting a job array (with -a or –array)</small> | ||
|- | |- | ||
| Walltime || <code>$PBS_WALLTIME</code> || – || | | Walltime || <code>$PBS_WALLTIME</code> || – || | ||
Line 125: | Line 125: | ||
| Job user || – || <code>$SLURM_JOB_USER</code> || | | Job user || – || <code>$SLURM_JOB_USER</code> || | ||
|- | |- | ||
| Hostname || <code>$HOSTNAME</code> || <code>$HOSTNAME == $SLURM_SUBMIT_HOST</code> || Unless a shell is invoked on an allocated resource, the HOSTNAME variable is propagated (copied) from the submit machine environments will be the same on all allocated nodes. | | Hostname || <code>$HOSTNAME</code> || <code>$HOSTNAME == $SLURM_SUBMIT_HOST</code> || <small>Unless a shell is invoked on an allocated resource, the HOSTNAME variable is propagated (copied) from the submit machine environments will be the same on all allocated nodes.</small> | ||
|} | |} |
Revision as of 10:52, 11 February 2020
Later this year GACRC will be implementing the Simple Linux Utility for Resource Management (Slurm) software for job scheduling and resource management on Sapelo2, to replace the Torque (PBS) resource manager and Moab scheduling system that it currently uses.
How is Slurm different from Torque?
Job Submission Options
As with Torque, job options and resource requests in Slurm can be set in the job submission script or as options to the job submission command. However, the syntax used to request resources is different and the table below summarizes some of the options that are frequently used.
Option | Torque (qsub) | Slurm (sbatch) |
---|---|---|
Script directive | #PBS |
#SBATCH
|
Job name | -N <name> |
--job-name=<name> -J <name>
|
Queue | -q <queue> |
--partition=<queue>
|
Wall time limit | -l walltime=<hh:mm:ss> |
--time=<hh:mm:ss>
|
Node count | -l nodes=<count> |
--nodes=<count> -N <count>
|
Process count per node | -l ppn=<count> |
--ntasks-per-node=<count>
|
core count (per process) | --cpus-per-task=<cores>
| |
Memory limit | -l mem=<limit> |
--mem=<limit> (Memory per node in mega bytes – MB)
|
Minimum memory per processor | -l pmem=<limit> |
--mem-per-cpu=<memory>
|
Request GPUs | -l gpus=<count> |
--gres=gpu:<count>
|
Request specific nodes | -l nodes=<node>[,node2[,...]]> |
-w, --nodelist=<node>[,node2[,...]]> -F, --nodefile=<node file>
|
Job array | -t <array indices> |
--array <indexes> -a <indexes> Where <indexes> is replaced by a range (0-15), a list (0, 6, 16-32), or a step function (0-15:4) |
Standard output file | -o <file path> |
--output=<file path> (path must exist)
|
Standard error file | -e <file path> |
--error=<file path> (path must exist)
|
Combine stdout/stderr to stdout | -j oe |
--output=<combined out and err file path>
|
Copy environment | -V |
--export=ALL (default) --export=NONE to not export environment
|
Copy environment variable | -v <variable[=value][,variable2=value2[,...]]> |
--export=<variable[=value][,variable2=value2[,...]]>
|
Job dependency | -W depend=after:jobID[:jobID...] -W depend=afterok:jobID[:jobID...] -W depend=afternotok:jobID[:jobID...] -W depend=afterany:jobID[:jobID...] |
--dependency=after:jobID[:jobID...] --dependency=afterok:jobID[:jobID...] --dependency=afternotok:jobID[:jobID...] --dependency=afterany:jobID[:jobID...]
|
Request event notification | -m <events> |
--mail-type=<events> Note: multiple mail-type requests may be specified in a comma separated list: --mail-type=BEGIN,END,NONE,FAIL,REQUEUE
|
Email address | -M <email address> |
--mail-user=<email address>
|
Defer job until the specified time | -a <date/time> |
--begin=<date/time>
|
Node exclusive job | qsub -n |
--exclusive
|
Job Environment and Environment Variables
In Slurm, environment variables will get passed to your job by default. If you have certain environment variables set that you think might interfere with your job you can either:
- Log out then log back in and submit your job
- Run sbatch with these options to override the default behavior:
sbatch --export=None sbatch --export MYPARAM=3 sbatch --export=ALL,MYPARAM=3
Like Torque, Slurm sets its own environment variables within your job. The table below summarizes some environment variables that are frequently used.
Info | Torque | Slurm | Notes |
---|---|---|---|
Version | $PBS_VERSION |
– | Can extract from sbatch --version
|
Job name | $PBS_JOBNAME |
$SLURM_JOB_NAME |
|
Job ID | $PBS_JOBID |
$SLURM_JOB_ID |
|
Batch or interactive | $PBS_ENVIRONMENT |
– | |
Submit directory | $PBS_O_WORKDIR |
$SLURM_SUBMIT_DIR |
Slurm jobs starts from the submit directory by default. |
Submit host | $PBS_O_HOST |
$SLURM_SUBMIT_HOST |
|
Node file | $PBS_NODEFILE |
A filename and path that lists the nodes a job has been allocated. | |
Node list | cat $PBS_NODEFILE |
$SLURM_JOB_NODELIST |
The Slurm variable has a different format to the Torque/PBS one. To get a list of nodes use: scontrol show hostnames $SLURM_JOB_NODELIST
|
Job array index | $PBS_ARRAYID $PBS_ARRAY_INDEX |
$SLURM_ARRAY_TASK_ID |
Only set when submitting a job array (with -a or –array) |
Walltime | $PBS_WALLTIME |
– | |
Queue name | $PBS_QUEUE |
$SLURM_JOB_PARTITION |
|
Number of nodes allocated | $PBS_NUM_NODES |
$SLURM_JOB_NUM_NODES $SLURM_NNODES |
|
Number of processes | $PBS_NP |
$SLURM_NTASKS |
|
Number of processes per node | $PBS_NUM_PPN |
$SLURM_TASKS_PER_NODE |
|
List of allocated GPUs | $PBS_GPUFILE |
– | |
Requested tasks per node | – | $SLURM_NTASKS_PER_NODE |
|
Requested CPUs per task | – | $SLURM_CPUS_PER_TASK |
|
Scheduling priority | – | $SLURM_PRIO_PROCESS |
|
Job user | – | $SLURM_JOB_USER |
|
Hostname | $HOSTNAME |
$HOSTNAME == $SLURM_SUBMIT_HOST |
Unless a shell is invoked on an allocated resource, the HOSTNAME variable is propagated (copied) from the submit machine environments will be the same on all allocated nodes. |
Common Job Commands
How to Submit and Manage Jobs
How to Monitor Jobs
Valid Job States