Running Jobs on Sapelo2: Difference between revisions
Line 126: | Line 126: | ||
</pre> | </pre> | ||
The ID of each element in an array job is stored in | The ID of each element in an array job, i.e., job array index value, is stored in SLURM_ARRAY_TASK_ID. SLURM_ARRAY_JOB_ID will be set to the first job ID of the array. SLURM_ARRAY_TASK_COUNT will be set to the number of tasks in the job array. SLURM_ARRAY_TASK_MAX will be set to the highest job array index value. SLURM_ARRAY_TASK_MIN will be set to the lowest job array index value. Each array job element runs as an independent job, so multiple array elements can run concurrently, if resources are available. For example: | ||
<pre class="gscript"> | |||
sbatch --array=1-3 -N1 tmp | |||
will generate a job array containing three jobs. If the sbatch command responds | |||
Submitted batch job 36 | |||
then the environment variables will be set as follows: | |||
SLURM_JOB_ID=36 | |||
SLURM_ARRAY_JOB_ID=36 | |||
SLURM_ARRAY_TASK_ID=1 | |||
SLURM_ARRAY_TASK_COUNT=3 | |||
SLURM_ARRAY_TASK_MAX=3 | |||
SLURM_ARRAY_TASK_MIN=1 | |||
SLURM_JOB_ID=37 | |||
SLURM_ARRAY_JOB_ID=36 | |||
SLURM_ARRAY_TASK_ID=2 | |||
SLURM_ARRAY_TASK_COUNT=3 | |||
SLURM_ARRAY_TASK_MAX=3 | |||
SLURM_ARRAY_TASK_MIN=1 | |||
SLURM_JOB_ID=38 | |||
SLURM_ARRAY_JOB_ID=36 | |||
SLURM_ARRAY_TASK_ID=3 | |||
SLURM_ARRAY_TASK_COUNT=3 | |||
SLURM_ARRAY_TASK_MAX=3 | |||
SLURM_ARRAY_TASK_MIN=1 | |||
</pre> | |||
====Option to set job dependency==== | ====Option to set job dependency==== |
Revision as of 10:59, 9 June 2020
Note: This page is for using new queueing system on the Sapelo2 cluster. This page is still under development as of June 9, 2020.
If you are current Sapelo2 users, please refer to Running Jobs on Sapelo2 for instructions on how to run jobs on Sapelo2.
Using the Queueing System
The login node for the Sapelo2 cluster should be used for text editing, and job submissions. No jobs should be run directly on the login node. Processes that use too much CPU or RAM on the login node may be terminated by GACRC staff, or automatically, in order to keep the cluster running properly. Jobs should be run using the Slurm queueing system. The queueing system should be used to run both interactive and batch jobs.
Batch Queues defined on the Sapelo2
There are different queues defined on Sapelo2. The Slurm queueing system refers to queues as partition. Users are required to specify, in the job submission script or as job submission command line arguments, the queue and the resources needed by the job in order for it to be assigned to compute node(s) that have enough available resources (such as number of cores, amount of memory, GPU cards, etc). Please note, Slurm will not allow a job to be submitted if there are no resources matching your request. Please refer to Migrating from Torque to Slurm for more info about Slurm queueing system.
The table below summarizes the partitions (queues) defined and the compute nodes that they target:
Queue Name | Node Type | Node Number | Description | Notes |
---|---|---|---|---|
You can check all partitions (queues) defined in the cluster with the command
sinfo
Job submission Scripts
Users are required to specify the number of cores, the amount of memory, the queue name, and the maximum wallclock time needed by the job.
Header lines
Basic job submission script
At a minimum, the job submission script needs to have the following header lines:
#!/bin/bash #SBATCH --partition=batch #SBATCH --job-name=test #SBATCH --ntasks=1 #SBATCH --time=48:00:00 #SBATCH --mem=10gb
Commands to run your application should be added after these header lines.
Header lines explained
- #!/bin/bash : used to specify using /bin/bash shell
- #SBATCH --partition=batch : used to specify the partition (queue) name, e.g. batch
- #SBATCH --job-name=test : used to specify the name of the job, e.g. test
- #SBATCH --ntasks=1 : used to specify the number of tasks (e.g. 1).
- #SBATCH --time=48:00:00 : used to specify the maximum allowed wall clock time in dd:hh:mm:ss format for the job (e.g 48 hours).
- #SBATCH --mem=10gb : used to specify the maximum memory allowed for the job (e.g. 10GB)
Below are some of the most commonly used queueing system options to configure the job.
Options to request resources for the job
- -t, --time=time
Wall clock time limit of a job running on cluster. Acceptable formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes", and "days-hours:minutes:seconds".
- --mem=num
Maximum amount of memory in MegaBytes per node required by the job.
- --mem-per-cpu=num
Minimum amount of memory in MegaBytes per allocated CPU.
- -n, --ntasks=num
Number of tasks to run. The default is one task per node. Note that the --cpus-per-task option will change this default.
- -N, --nodes=num
Number of nodes be allocated to the job. Default is one node.
- --ntasks-per-node=ntasks
Request that ntasks be invoked on each node. If used with the --ntasks option, the --ntasks option will take precedence and the --ntasks-per-node will be treated as a maximum count of tasks per node. Meant to be used with the --nodes option.
- -c, --cpus-per-task=ncpus
Request that ncpus be allocated per process. This may be useful if the job is multithreaded and requires more than one CPU per task for optimal performance. The default is one CPU per process.
Please try to request resources for your job as accurately as possible, because this allows your job to be dispatched to run at the earliest opportunity and it helps the system allocate resources efficiently to start as many jobs as possible, benefiting all users.
Options to manage job notification and output
- -J, --job-name jobname
Specify a name for the job. The specified name will appear along with the job id number when querying running jobs on the system. The default is the supplied executable program's name. Within the job, $SBATCH_JOB_NAME expands to the job name.
- -o, --output=path/for/stdout
Send stdout to path/for/stdout. The default filename is slurm-${SLURM_JOB_ID}.out, e.g. slurm-12345.out, in the directory from which the job was submitted.
- -e, --error=path/for/stderr
Send stderr to path/for/stderr. If --error is not specified, both stdout and stderr will directed to the file specified by --output.
- --mail-user=username@uga.edu
Send email notification to the address you specified when certain events occur.
- --mail-type=type
Notify user by email when certain event types occur. Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL, TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent of time limit), TIME_LIMIT_80 and TIME_LIMIT_50.
Options to set Array Jobs
If you wish to run an application binary or script using e.g. different input files, then you might find it convenient to use an array job. To create an array job with e.g. 10 elements, use
#SBATCH -a 0-9
or
#SBATCH --array=0-9
The ID of each element in an array job, i.e., job array index value, is stored in SLURM_ARRAY_TASK_ID. SLURM_ARRAY_JOB_ID will be set to the first job ID of the array. SLURM_ARRAY_TASK_COUNT will be set to the number of tasks in the job array. SLURM_ARRAY_TASK_MAX will be set to the highest job array index value. SLURM_ARRAY_TASK_MIN will be set to the lowest job array index value. Each array job element runs as an independent job, so multiple array elements can run concurrently, if resources are available. For example:
sbatch --array=1-3 -N1 tmp will generate a job array containing three jobs. If the sbatch command responds Submitted batch job 36 then the environment variables will be set as follows: SLURM_JOB_ID=36 SLURM_ARRAY_JOB_ID=36 SLURM_ARRAY_TASK_ID=1 SLURM_ARRAY_TASK_COUNT=3 SLURM_ARRAY_TASK_MAX=3 SLURM_ARRAY_TASK_MIN=1 SLURM_JOB_ID=37 SLURM_ARRAY_JOB_ID=36 SLURM_ARRAY_TASK_ID=2 SLURM_ARRAY_TASK_COUNT=3 SLURM_ARRAY_TASK_MAX=3 SLURM_ARRAY_TASK_MIN=1 SLURM_JOB_ID=38 SLURM_ARRAY_JOB_ID=36 SLURM_ARRAY_TASK_ID=3 SLURM_ARRAY_TASK_COUNT=3 SLURM_ARRAY_TASK_MAX=3 SLURM_ARRAY_TASK_MIN=1
Option to set job dependency
You can set job dependency with the option -d or --dependency=dependency-list. For example, if you want to specify that one job only starts after job with jobid 1234 finishes, you can add the following header line in the job submission script of the job:
#SBATCH --dependency=afterok:1234
Having this header line in the job submission script will ensure that the job is only dispatched to run after job 1234 has completed successfully.
Other content of the script
Following the header lines, users can include commands to change to the working directory, to load the modules needed to run the application, and to invoke the application. For example, to use the directory from which the job is submitted as the working directory (where to find input files or binaries), add the line
cd $SLURM_SUBMIT_DIR
You can then load the needed modules. For example, if you are running an R program, then include the line
module load R/3.4.4-foss-2016b-X11-20160819-GACRC
Then invoke your application. For example, if you are running an R program called add.R which is in your job submission directory, use
R CMD BATCH add.R
Environment Variables exported by batch jobs
When a batch job is started, a number of variables are introduced into the job's environment that can be used by the batch script in making decisions, creating output files, and so forth. Some of these variables are listed in the following table:
Variable | Description |
---|---|
SLURM_ARRAY_JOB_ID | Job id of an array job |
SLURM_ARRAY_TASK_ID | Value of job array index for this job |
SLURM_CPUS_ON_NODE | Number of CPUS on the allocated node. |
SLURM_CPUS_PER_TASK | Number of cpus requested per task. Only set if the --cpus-per-task option is specified. |
SLURM_JOB_ID | Unique pbs job id |
SLURM_JOB_NAME | User specified jobname |
SLURM_JOB_CPUS_PER_NODE | Count of processors available to the job on this node. |
SLURM_JOB_NAME | Name of the job. |
SLURM_JOB_NODELIST | List of nodes allocated to the job. |
SLURM_JOB_NUM_NODES | Total number of nodes in the job's resource allocation. |
SLURM_JOB_PARTITION | Name of the partition (i.e. queue) in which the job is running. |
SLURM_NTASKS | Same as -n, --ntasks |
SLURM_NTASKS_PER_NODE | Number of tasks requested per node. Only set if the --ntasks-per-node option is specified. |
SLURM_SUBMIT_DIR | The directory from which sbatch was invoked. |
SLURM_TASKS_PER_NODE | Number of tasks to be initiated on each node. |
Sample job submission scripts
Serial (single-processor) Job
Sample job submission script (sub.sh) to run an R program called add.R using a single core:
#!/bin/bash #SBATCH --job-name=testserial # Job name #SBATCH --partition=batch # Partition (queue) name #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=username@uga.edu # Where to send mail #SBATCH --ntasks=1 # Run on a single CPU #SBATCH --mem=1gb # Job memory request #SBATCH --time=02:00:00 # Time limit hrs:min:sec #SBATCH --output=testserial.%j.out # Standard output log #SBATCH --error=testserial.%j.err # Standard error log cd $SLURM_SUBMIT_DIR module load R/3.4.4-foss-2016b-X11-20160819-GACRC R CMD BATCH add.R
In this sample script, the standard output and error of the job will be saved into a file called testserial.o%j, where %j will be automatically replaced by the job id of the job.
MPI Job
Sample job submission script (sub.sh) to run an OpenMPI application. In this example the job requests 16 cores and further specifies that these 16 cores need to be divided equally on 2 nodes (8 cores per node) and the binary is called mympi.exe:
#!/bin/bash #SBATCH --job-name=mpitest # Job name #SBATCH --partition=batch # Partition (queue) name #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=username@uga.edu # Where to send mail #SBATCH --ntasks=16 # Number of MPI ranks #SBATCH --cpus-per-task=1 # Number of cores per MPI rank #SBATCH --nodes=2 # Number of nodes #SBATCH --ntasks-per-node=8 # How many tasks on each node #SBATCH --mem-per-cpu=600mb # Memory per processor #SBATCH --time=02:00:00 # Time limit hrs:min:sec #SBATCH --output=mpitest.%j.out # Standard output log #SBATCH --error=mpitest.%j.err # Standard error log cd $SLURM_SUBMIT_DIR module load OpenMPI/1.10.3-GCC-5.4.0-2.26 mpirun ./mympi.exe
OpenMP (Multi-Thread) Job
Sample job submission script (sub.sh) to run a program that uses OpenMP with 6 threads. Please set --ntasks=1 and set --cpus-per-task to the number of threads you wish to use. The name of the binary in this example is a.out.
#!/bin/bash #SBATCH --job-name=mctest # Job name #SBATCH --partition=batch # Partition (queue) name #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=username@uga.edu # Where to send mail #SBATCH --ntasks=1 # Run a single task #SBATCH --cpus-per-task=6 # Number of CPU cores per task #SBATCH --mem=4gb # Job memory request #SBATCH --time=02:00:00 # Time limit hrs:min:sec #SBATCH --output=mctest.%j.out # Standard output log #SBATCH --error=mctest.%j.err # Standard error log cd $SLURM_SUBMIT_DIR export OMP_NUM_THREADS=6 module load foss/2016b # load the appropriate module file, e.g. foss/2016b time ./a.out
High Memory Job
Sample job submission script (sub.sh) to run a velvet application that needs to use 50GB of memory and 4 threads:
#!/bin/bash #SBATCH --job-name=highmemtest # Job name #SBATCH --partition=highmem # Partition (queue) name #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=username@uga.edu # Where to send mail #SBATCH --ntasks=1 # Run a single task #SBATCH --cpus-per-task=4 # Number of CPU cores per task #SBATCH --mem=50gb # Job memory request #SBATCH --time=02:00:00 # Time limit hrs:min:sec #SBATCH --output=highmemtest.%j.out # Standard output log #SBATCH --error=highmemtest.%j.err # Standard error log cd $SLURM_SUBMIT_DIR export OMP_NUM_THREADS=4 module load Velvet velvetg [options]
Sample job submission script (sub.sh) to run a parallel job that uses 4 MPI processes with OpenMPI and each MPI process runs with 3 threads:
#!/bin/bash #SBATCH --job-name=hybridtest #SBATCH --partition=batch # Partition (queue) name #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=username@uga.edu # Where to send mail #SBATCH --nodes=2 # Number of nodes #SBATCH --ntasks=4 # Number of MPI ranks #SBATCH --ntasks-per-node=2 # Number of MPI ranks per node #SBATCH --cpus-per-task=3 # Number of OpenMP threads for each MPI process/rank #SBATCH --mem-per-cpu=2000mb # Per processor memory request #SBATCH --time=2-00:00:00 # Walltime in hh:mm:ss or d-hh:mm:ss (2 days in the example) #SBATCH --output=hybridtest.%j.out # Standard output log #SBATCH --error=hybridtest.%j.err # Standard error log cd $SLURM_SUBMIT_DIR export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK mpirun ./myhybridprog.exe
Array job
Sample job submission script (sub.sh) to submit an array job with 10 elements. In this example, each array job element will run the a.out binary using an input file called input_0, input_1, ..., input_9.
#!/bin/bash #SBATCH --job-name=arrayjobtest # Job name #SBATCH --partition=batch # Partition (queue) name #SBATCH --ntasks=1 # Run a single task #SBATCH --mem=1gb # Job Memory #SBATCH --time=10:00:00 # Time limit hrs:min:sec #SBATCH --output=array_%A-%a.out # Standard output log #SBATCH --error=array_%A-%a.err # Standard error log #SBATCH --array=0-9 # Array range cd $SLURM_SUBMIT_DIR module load foss/2016b # load any needed module files, e.g. foss/2016b time ./a.out < input_$SLURM_ARRAY_TASK_ID
GPU/CUDA
To be added.
How to submit a job to the batch queue
With the resource requirements specified in the job submission script (sub.sh), submit your job with
sbatch <scriptname>
For example
sbatch sub.sh
Once the job is submitted, the Job ID of the job (e.g. 12345) will be printed on the screen.
Discovering if a partition (queue) is busy
The nodes allocated to each partition (queue) and their state can be view with the command
sinfo
Sample output of the sinfo command:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST highmem up 7-00:00:00 5 idle c1-[36-37,40],c2-[9-10] gpu up 1-00:00:00 1 idle c2-2 interq up 1-00:00:00 3 idle c2-[4-6] batch up 3-00:00:00 3 mix c1-38,c2-[11-12] batch up 3-00:00:00 1 alloc c1-1 batch up 3-00:00:00 36 idle c1-[2-35,39]
where some common values of STATE are:
- STATE=idle indicates that those nodes are completely free.
- STATE=mix indicates that some cores on those nodes are in use (and some are free).
- STATE=alloc indicates that all cores on those nodes are in use.
How to open an interactive session
An interactive session on a compute node can be started with the command
qlogin
This command will start an interactive session with one core on one of the interactive nodes, and allocate 2GB of memory for a maximum walltime of 12h.
The qlogin command is an alias for
srun --pty -p interq --time=12:00:00 --mem=2gb bash
How to run an interactive job with Graphical User Interface capabilities
If you want to run an application as an interactive job and have its graphical user interface displayed on the terminal of your local machine, you need to enable X-forwarding when you ssh into the login node. For information on how to do this, please see questions 10 and 11 in the Frequently Asked Questions page.
On the teaching cluster, X-forwarding does not work from any of the compute nodes, including the interactive nodes. Please feel free to run X windows applications directly on the teaching cluster login node.
How to check on running or pending jobs
To list all running and pending jobs (by all users), use the command
squeue
or
squeue -l
For detailed information on how to monitor your jobs, please see Monitoring Jobs on the teaching cluster.
How to delete a running or pending job
To delete one of your running or pending job, use the command
scancel <jobid>
For example, to delete a job with Job ID 12345 use
scancel 12345
How to check resource utilization of a running or finished job
The following command can be used to show resource utilization by a running job or a job that has already completed:
sacct
This command can be used with many options. We have configured one option that shows some quantities that are commonly of interest, including the amount of memory used and the cputime used by the jobs:
sacct_zh
For detailed information on how to monitor your jobs, please see Monitoring Jobs on the teaching cluster.