Running Jobs on Sapelo2: Difference between revisions
No edit summary |
|||
Line 717: | Line 717: | ||
interact | interact | ||
</pre> | </pre> | ||
This command, invoked without any arguments, will start an interactive session with one core on one of the interactive nodes, and allocate 2GB of memory for a maximum walltime of 12h. It is equivalent to <code>qlogin</code> command that we used previously, and it runs | This command, invoked without any arguments, will start an interactive session with one core on one of the interactive nodes, and allocate 2GB of memory for a maximum walltime of 12h. It is equivalent to the <code>qlogin</code> command that we used previously, and it runs | ||
<pre class="gcommand"> | <pre class="gcommand"> | ||
srun --pty --cpus-per-task=1 --job-name=interact --ntasks=1 --nodes=1 --partition=inter_p --time=12:00:00 --mem=2GB /bin/bash -l | srun --pty --cpus-per-task=1 --job-name=interact --ntasks=1 --nodes=1 --partition=inter_p --time=12:00:00 --mem=2GB /bin/bash -l |
Revision as of 08:55, 17 September 2021
Using the Queueing System
The login node for the Sapelo2 cluster should be used for text editing, and job submissions. No jobs should be run directly on the login node. Processes that use too much CPU or RAM on the login node may be terminated by GACRC staff, or automatically, in order to keep the cluster running properly. Jobs should be run using the Slurm queueing system. The queueing system should be used to run both interactive and batch jobs.
Batch partitions (queues) defined on the Sapelo2
There are different partitions defined on Sapelo2. The Slurm queueing system refers to queues as partition. Users are required to specify, in the job submission script or as job submission command line arguments, the partition and the resources needed by the job in order for it to be assigned to compute node(s) that have enough available resources (such as number of cores, amount of memory, GPU cards, etc). Please note, Slurm will not allow a job to be submitted if there are no resources matching your request. Please refer to Migrating from Torque to Slurm for more info about Slurm queueing system.
The following partitions are defined on the Sapelo2 cluster:
Partition Name | Time limit | Max jobs | Notes |
---|---|---|---|
batch | 7 days | Regular nodes. | |
batch-30d | 30 days | 2 | Regular nodes. A given user can have up to one job running at a time here, plus one pending, or two pending and none running. A user's attempt to submit a third job into this partition will be rejected. |
highmem_p | 7 days | For high memory jobs | |
highmem_30d_p | 30 days | 2 | For high memory jobs. A given user can have up to one job running at a time here, plus one pending, or two pending and none running. A user's attempt to submit a third job into this partition will be rejected. |
gpu_p | 7 days | For GPU-enabled jobs. | |
gpu_30d_p | 30 days | 2 | For GPU-enabled jobs. A given user can have up to one job running at a time here, plus one pending, or two pending and none running. A user's attempt to submit a third job into this partition will be rejected. |
inter_p | 2 days | Regular nodes, for interactive jobs. | |
name_p | variable | Partitions that target different groups' buy-in nodes. The name string is specific to each group. | |
scavenge_p | 2 hours | Partition that targets the buy-in nodes. When there are no available resources in the batch partition, short jobs submitted there might be automatically transferred into scavenge_p, to run on idle buy-in resources. Jobs cannot be submitted directly to this partition. |
The table below summarizes the partitions (queues) defined and the compute nodes that they target:
Partition Name | Node Features | Node Number | Description | Memory for jobs | Notes |
---|---|---|---|---|---|
batch, batch_30d | AMD, Opteron, QDR | 48-core, 128GB RAM, AMD Opteron, QDR IB interconnect | 122GB | Regular nodes. | |
batch, batch_30d | AMD, EPYC, EDR | 64-core, 128GB RAM, AMD EPYC, IB EDR interconnect | 120GB | Regular nodes | |
batch, batch_30d | AMD, EPYC, EDR | 32-core, 128GB RAM, AMD EPYC, IB EDR interconnect | 120GB | Regular nodes | |
batch, batch_30d | AMD, Opteron, QDR | 48-core, 256GB RAM, AMD Opteron, QDR IB interconnect | 250GB | Regular nodes. | |
batch, batch_30d | Intel, Skylake, EDR | 32-core, 192GB RAM, Intel Xeon Skylake, IB EDR interconnect | 180GB | Regular nodes | |
batch, batch_30d | Intel, Broadwell, EDR | 28-core, 64GB RAM, Intel Xeon Broadwell, IB EDR interconnect | 58GB | Regular nodes | |
highmem_p, highmem_30d_p | AMD, EPYC, EDR | 64-core, 1TB RAM, AMD EPYC, IB EDR interconnect | 950GB | For high memory jobs | |
highmem_p, highmem_30d_p | Intel, EDR | 32-core, 1TB RAM, Intel, IB EDR interconnect | 950GB | For high memory jobs | |
highmem_p, highmem_30d_p | AMD, Opteron, EDR | 48-core, 1TB RAM, AMD Opteron, IB EDR interconnect | 950GB | For high memory jobs | |
highmem_p, highmem_30d_p | AMD, Opteron, QDR | 48-core, 512GB, AMD Opteron, IB QDR interconnect | 500GB | For high memory jobs | |
highmem_p, highmem_30d_p | AMD, EPYC, EDR | 32-core, 512GB RAM, AMD EPYC, IB EDR interconnect | 490GB | For high memory jobs | |
gpu_p, gpu_30d_p | GPU, P100, EDR | 32-core, 192GB RAM, Intel Xeon Skylake, 1 NVIDIA P100 GPUs, EDR IB interconnect | 180GB | For GPU-enabled jobs. | |
gpu_p, gpu_30d_p | GPU, K40, QDR | 16-core, GB RAM, Intel Xeon , 8 NVIDIA K40 GPUs, IB interconnect | GB | For GPU-enabled jobs. | |
gpu_p, gpu_30d_p | GPU, K20, QDR | -core, GB RAM, Intel Xeon , 7 NVIDIA K20Xm GPUs, QDR IB interconnect | GB | For GPU-enabled jobs. |
You can check all partitions (queues) defined in the cluster with the command
sinfo
Job submission Scripts
Users are required to specify the number of cores, the amount of memory, the partition (queue) name, and the maximum wallclock time needed by the job.
Header lines
Basic job submission script
At a minimum, the job submission script needs to have the following header lines:
#!/bin/bash #SBATCH --partition=batch #SBATCH --job-name=test #SBATCH --ntasks=1 #SBATCH --time=4:00:00 #SBATCH --mem=10G
Commands to run your application should be added after these header lines.
Header lines explained:
- #!/bin/bash: specify Linux default shell bash
- #SBATCH --partition=batch : specify the partition (queue) to run on, e.g. batch
- #SBATCH --job-name=test : specify the job name, e.g. test
- #SBATCH --ntasks=1 : specify the number of tasks (e.g. 1)
- #SBATCH --time=4:00:00 : specify the maximum walltime of the job in the format D-HH:MM:SS (e.g. --time=1- for one day or --time=4:00:00 for 4 hours)
- #SBATCH --mem=10G : specify the maximum memory per node required by the job (e.g. 10GB)
Below are some of the most commonly used queueing system options to configure the job.
Options to request resources for the job
- -t, --time=time
Wall clock time limit of a job running on cluster. Acceptable formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes", and "days-hours:minutes:seconds". This is a required option.
- --mem=num
Maximum amount of memory in MegaBytes per node required by the job. Different units can be specified using the suffix [K|M|G|T].
- --mem-per-cpu=num
Minimum amount of memory in MegaBytes per allocated CPU. Different units can be specified using the suffix [K|M|G|T].
- -n, --ntasks=num
Number of tasks to run. The default is one task per node. For use with distributed parallelism. See below.
- -N, --nodes=num
Number of nodes allocated to the job. Default is one node.
- --ntasks-per-node=num
Number of tasks invoked on each node. Meant to be used with the --nodes option. For use with distributed parallelism. See below.
- -c, --cpus-per-task=ncpus
Number of CPUs allocated to each task. For use with shared memory parallelism. See below.
- -C, --constraint=<list>
List of node features required by the job. Only nodes having features matching the job constraints will be used to satisfy the request. Multiple constraints may be specified with AND, OR, matching OR, resource counts, etc.
- --gres=<list>
A comma delimited list of generic consumable resources. For example, to request one P100 GPU card: --gres=gpu:P100:1
Please try to request resources for your job as accurately as possible, because this allows your job to be dispatched to run at the earliest opportunity and it helps the system allocate resources efficiently to start as many jobs as possible, benefiting all users.
Options to manage job notification and output
- -J, --job-name jobname
Specify a name for the job. The specified name will appear along with the job id number when querying running jobs on the system. The default is the supplied executable program's name. Within the job, $SBATCH_JOB_NAME expands to the job name.
- -o, --output=path/for/stdout
Send stdout to path/for/stdout. The default filename is slurm-${SLURM_JOB_ID}.out, e.g. slurm-12345.out, in the directory from which the job was submitted.
- -e, --error=path/for/stderr
Send stderr to path/for/stderr. If --error is not specified, both stdout and stderr will directed to the file specified by --output.
- --mail-user=username@uga.edu
Send email notification to the address you specified when certain events occur.
- --mail-type=type
Notify user by email when certain event types occur. Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL, TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent of time limit), TIME_LIMIT_80 and TIME_LIMIT_50.
Options to set Array Jobs
If you wish to run an application binary or script using e.g. different input files, then you might find it convenient to use an array job. To create an array job with e.g. 10 elements, use
#SBATCH -a 0-9
or
#SBATCH --array=0-9
The ID of each element in an array job, i.e., job array index value, is stored in SLURM_ARRAY_TASK_ID. SLURM_ARRAY_JOB_ID will be set to the first job ID of the array. SLURM_ARRAY_TASK_COUNT will be set to the number of tasks in the job array. SLURM_ARRAY_TASK_MAX will be set to the highest job array index value. SLURM_ARRAY_TASK_MIN will be set to the lowest job array index value. Each array job element runs as an independent job, so multiple array elements can run concurrently, if resources are available. For example:
sbatch --array=1-3 -N1 sub.sh will generate a job array containing three jobs. If the sbatch command responds Submitted batch job 36 then the environment variables will be set as follows: SLURM_JOB_ID=36 SLURM_ARRAY_JOB_ID=36 SLURM_ARRAY_TASK_ID=1 SLURM_ARRAY_TASK_COUNT=3 SLURM_ARRAY_TASK_MAX=3 SLURM_ARRAY_TASK_MIN=1 SLURM_JOB_ID=37 SLURM_ARRAY_JOB_ID=36 SLURM_ARRAY_TASK_ID=2 SLURM_ARRAY_TASK_COUNT=3 SLURM_ARRAY_TASK_MAX=3 SLURM_ARRAY_TASK_MIN=1 SLURM_JOB_ID=38 SLURM_ARRAY_JOB_ID=36 SLURM_ARRAY_TASK_ID=3 SLURM_ARRAY_TASK_COUNT=3 SLURM_ARRAY_TASK_MAX=3 SLURM_ARRAY_TASK_MIN=1
Most Slurm commands recognize the SLURM_ARRAY_JOB_ID plus SLURM_ARRAY_TASK_ID values separated by an underscore as identifying an element of a job array, for example, 36_2 would be equivalent ways to identify the second array element of array job 36.
Option to set job dependency
You can set job dependency with the option -d or --dependency=dependency-list. For example, if you want to specify that one job starts to run after the job 1234 and 1235 have successfully executed (ran to completion with an exit code of zero), you can add the following header line in the job submission script of the job:
#SBATCH --dependency=afterok:1234:1235
Having this header line in the job submission script will ensure that the job is only dispatched to run after job 1234 and 1235 have completed successfully.
You can also use the following header line to specify that one job starts to run after the job 1236 and 1237 start or are cancelled:
#SBATCH --dependency=after:1236:1237
Options to requeue or not requeue a job when a node crashes
If a job is running and one or more nodes that it is using crash, the job will stop running and, by default, it will get requeued. When resources become available, the job will start running again, from the beginning, unless the program saves intermediate results and it is able to automatically pick up from where it stopped. The files with the standard error and standard output of the job will get rewritten once the job restarts. Often other output files will get rewritten as well.
If you are running a program that cannot restart, e.g. the program will fail if a certain output file or directory has already been created, or if you would like to preserve the partial results, you can use the following option to prevent the job from being requeued:
#SBATCH --no-requeue
When this option is used, the job will simply stop if a node crashes, it will not be requeued. In this case partial results and the standard error and output of the job will not get overwritten.
Although requeueing jobs is enabled by default now, you can also add the option below if you would like to ensure a job is requeued in the event of a node crash:
#SBATCH --requeue
Other content of the script
Following the header lines, users can include commands to change to the working directory, to load the modules needed to run the application, and to invoke the application. For example, to use the directory from which the job is submitted as the working directory (where to find input files or binaries), add the line
cd $SLURM_SUBMIT_DIR
(Note that Slurm jobs start from the submit directory by default, so adding the line above might not be necessary.)
You can then load the needed modules. For example, if you are running an R program, then include the line
module load R/3.6.2-foss-2019b
Then invoke your application. For example, if you are running an R program called add.R which is in your job submission directory, use
R CMD BATCH add.R
Environment Variables exported by batch jobs
When a batch job is started, a number of variables are introduced into the job's environment that can be used by the batch script in making decisions, creating output files, and so forth. Some of these variables are listed in the following table:
Variable | Description |
---|---|
SLURM_ARRAY_JOB_ID | Job array's master job ID number, i.e., the first Slurm job id of a job array |
SLURM_ARRAY_TASK_COUNT | Total number of tasks (elements) in a job array |
SLURM_ARRAY_TASK_ID | Job array ID (index) number |
SLURM_ARRAY_TASK_MAX | Job array's maximum ID (index) number |
SLURM_ARRAY_TASK_MIN | Job array's minimum ID (index) number |
SLURM_CPUS_ON_NODE | Number of CPUS on the allocated node |
SLURM_CPUS_PER_TASK | Number of cpus requested per task. Only set if the --cpus-per-task option is specified |
SLURM_JOB_ID | Unique Slurm job id |
SLURM_JOB_NAME | Job name |
SLURM_JOB_CPUS_PER_NODE | Count of processors available to the job on this node |
SLURM_JOB_NODELIST | List of nodes allocated to the job |
SLURM_JOB_NUM_NODES | Total number of nodes in the job's resource allocation |
SLURM_JOB_PARTITION | Name of the partition (i.e. queue) in which the job is running |
SLURM_MEM_PER_NODE | Same as --mem |
SLURM_MEM_PER_CPU | Same as --mem-per-cpu |
SLURM_NTASKS | Same as -n, --ntasks |
SLURM_NTASKS_PER_NODE | Number of tasks requested per node. Only set if the --ntasks-per-node option is specified |
SLURM_SUBMIT_DIR | The directory from which sbatch was invoked |
SLURM_SUBMIT_HOST | The hostname of the computer from which sbatch was invoked |
SLURM_TASK_PID | The process ID of the task being started |
SLURMD_NODENAME | Name of the node running the job script |
CUDA_VISIBLE_DEVICES | GPU devide ID that assigned to the job to use |
Sample job submission scripts
Serial (single-processor) Job
Sample job submission script (sub.sh) to run an R program called add.R using a single core:
#!/bin/bash #SBATCH --job-name=testserial # Job name #SBATCH --partition=batch # Partition (queue) name #SBATCH --ntasks=1 # Run on a single CPU #SBATCH --mem=1gb # Job memory request #SBATCH --time=02:00:00 # Time limit hrs:min:sec #SBATCH --output=testserial.%j.out # Standard output log #SBATCH --error=testserial.%j.err # Standard error log #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=username@uga.edu # Where to send mail cd $SLURM_SUBMIT_DIR module load R/3.6.2-foss-2019b R CMD BATCH add.R
In this sample script, the standard output and error of the job will be saved into a file called testserial.o%j, where %j will be automatically replaced by the job id of the job.
Serial (single-processor) Job on an AMD EPYC processor
Sample job submission script (sub.sh) to run an R program called add.R using a single core:
#!/bin/bash #SBATCH --job-name=testserial # Job name #SBATCH --partition=batch # Partition (queue) name #SBATCH --constraint=EPYC # node feature #SBATCH --ntasks=1 # Run on a single CPU #SBATCH --mem=1gb # Job memory request #SBATCH --time=02:00:00 # Time limit hrs:min:sec #SBATCH --output=testserial.%j.out # Standard output log #SBATCH --error=testserial.%j.err # Standard error log #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=username@uga.edu # Where to send mail cd $SLURM_SUBMIT_DIR module load R/3.6.2-foss-2019b R CMD BATCH add.R
In this sample script, the standard output and error of the job will be saved into a file called testserial.o%j, where %j will be automatically replaced by the job id of the job.
MPI Job
Sample job submission script (sub.sh) to run an OpenMPI application. In this example the job requests 16 cores and further specifies that these 16 cores need to be divided equally on 2 nodes (8 cores per node) and the binary is called mympi.exe:
#!/bin/bash #SBATCH --job-name=mpitest # Job name #SBATCH --partition=batch # Partition (queue) name #SBATCH --nodes=2 # Number of nodes #SBATCH --ntasks=16 # Number of MPI ranks #SBATCH --ntasks-per-node=8 # How many tasks on each node #SBATCH --cpus-per-task=1 # Number of cores per MPI rank #SBATCH --mem-per-cpu=600mb # Memory per processor #SBATCH --time=02:00:00 # Time limit hrs:min:sec #SBATCH --output=mpitest.%j.out # Standard output log #SBATCH --error=mpitest.%j.err # Standard error log #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=username@uga.edu # Where to send mail cd $SLURM_SUBMIT_DIR module load OpenMPI/3.1.4-GCC-8.3.0 mpirun ./mympi.exe
Please note that you need to start the application with mpirun or mpiexec, and not with srun.
MPI Job on nodes connected via the EDR IB fabric
Sample job submission script (sub.sh) to run an OpenMPI application. In this example the job requests 16 cores and further specifies that these 16 cores need to be divided equally on 2 nodes (8 cores per node) and the binary is called mympi.exe:
#!/bin/bash #SBATCH --job-name=mpitest # Job name #SBATCH --partition=batch # Partition (queue) name #SBATCH --constraint=EDR # node feature #SBATCH --nodes=2 # Number of nodes #SBATCH --ntasks=16 # Number of MPI ranks #SBATCH --ntasks-per-node=8 # How many tasks on each node #SBATCH --cpus-per-task=1 # Number of cores per MPI rank #SBATCH --mem-per-cpu=600mb # Memory per processor #SBATCH --time=02:00:00 # Time limit hrs:min:sec #SBATCH --output=mpitest.%j.out # Standard output log #SBATCH --error=mpitest.%j.err # Standard error log #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=username@uga.edu # Where to send mail cd $SLURM_SUBMIT_DIR module load OpenMPI/3.1.4-GCC-8.3.0 mpirun ./mympi.exe
Please note that you need to start the application with mpirun or mpiexec, and not with srun.
OpenMP (Multi-Thread) Job
Sample job submission script (sub.sh) to run a program that uses OpenMP with 6 threads. Please set --ntasks=1 and set --cpus-per-task to the number of threads you wish to use. The name of the binary in this example is a.out.
#!/bin/bash #SBATCH --job-name=mctest # Job name #SBATCH --partition=batch # Partition (queue) name #SBATCH --ntasks=1 # Run a single task #SBATCH --cpus-per-task=6 # Number of CPU cores per task #SBATCH --mem=4gb # Job memory request #SBATCH --time=02:00:00 # Time limit hrs:min:sec #SBATCH --output=mctest.%j.out # Standard output log #SBATCH --error=mctest.%j.err # Standard error log #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=username@uga.edu # Where to send mail cd $SLURM_SUBMIT_DIR export OMP_NUM_THREADS=6 module load foss/2019b # load the appropriate module file, e.g. foss/2019b time ./a.out
High Memory Job
Sample job submission script (sub.sh) to run a velvet application that needs to use 200GB of memory and 4 threads:
#!/bin/bash #SBATCH --job-name=highmemtest # Job name #SBATCH --partition=highmem_p # Partition (queue) name #SBATCH --ntasks=1 # Run a single task #SBATCH --cpus-per-task=4 # Number of CPU cores per task #SBATCH --mem=200gb # Job memory request #SBATCH --time=02:00:00 # Time limit hrs:min:sec #SBATCH --output=highmemtest.%j.out # Standard output log #SBATCH --error=highmemtest.%j.err # Standard error log #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=username@uga.edu # Where to send mail cd $SLURM_SUBMIT_DIR export OMP_NUM_THREADS=4 module load Velvet velvetg [options]
Sample job submission script (sub.sh) to run a parallel job that uses 4 MPI processes with OpenMPI and each MPI process runs with 3 threads:
#!/bin/bash #SBATCH --job-name=hybridtest #SBATCH --partition=batch # Partition (queue) name #SBATCH --nodes=2 # Number of nodes #SBATCH --ntasks=8 # Number of MPI ranks #SBATCH --ntasks-per-node=4 # Number of MPI ranks per node #SBATCH --cpus-per-task=3 # Number of OpenMP threads for each MPI process/rank #SBATCH --mem-per-cpu=2000mb # Per processor memory request #SBATCH --time=2-00:00:00 # Walltime in hh:mm:ss or d-hh:mm:ss (2 days in the example) #SBATCH --output=hybridtest.%j.out # Standard output log #SBATCH --error=hybridtest.%j.err # Standard error log #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=username@uga.edu # Where to send mail cd $SLURM_SUBMIT_DIR export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK mpirun ./myhybridprog.exe
Array job
Sample job submission script (sub.sh) to submit an array job with 10 elements. In this example, each array job element will run the a.out binary using an input file called input_0, input_1, ..., input_9.
#!/bin/bash #SBATCH --job-name=arrayjobtest # Job name #SBATCH --partition=batch # Partition (queue) name #SBATCH --ntasks=1 # Run a single task #SBATCH --mem=1gb # Job Memory #SBATCH --time=10:00:00 # Time limit hrs:min:sec #SBATCH --output=array_%A-%a.out # Standard output log #SBATCH --error=array_%A-%a.err # Standard error log #SBATCH --array=0-9 # Array range cd $SLURM_SUBMIT_DIR module load foss/2019b # load any needed module files, e.g. foss/2019b time ./a.out < input_$SLURM_ARRAY_TASK_ID
Singularity job
Sample job submission script (sub.sh) to run a program (e.g. sortmerna) using a singularity container:
#!/bin/bash #SBATCH --job-name=j_sortmerna # Job name #SBATCH --partition=batch # Partition (queue) name #SBATCH --ntasks=1 # Run on a single CPU #SBATCH --mem=1gb # Job memory request #SBATCH --time=02:00:00 # Time limit hrs:min:sec #SBATCH --output=sortmerna.%j.out # Standard output log #SBATCH --error=sortmerna.%j.err # Standard error log #SBATCH --cpus-per-task=4 # Number of CPU cores per task #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=username@uga.edu # Where to send mail cd $SLURM_SUBMIT_DIR singularity exec /apps/singularity-images/sortmerna-3.0.3.simg sortmerna \ --threads 4 --ref db.fasta,db.idx --reads file.fa --aligned base_name_output
For more information about software installed as singularity containers on the cluster, please see Software_on_Sapelo2#Singularity_Containers
GPU/CUDA
Sample script to run Amber on a GPU node using one node, 2 CPU cores, and 1 GPU card:
#!/bin/bash #SBATCH --job-name=amber # Job name #SBATCH --partition=gpu_p # Partition (queue) name #SBATCH --gres=gpu:1 # Requests one GPU device #SBATCH --ntasks=1 # Run a single task #SBATCH --cpus-per-task=2 # Number of CPU cores per task #SBATCH --mem=40gb # Job memory request #SBATCH --time=10:00:00 # Time limit hrs:min:sec #SBATCH --output=amber.%j.out # Standard output log #SBATCH --error=amber.%j.err # Standard error log #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=username@uga.edu # Where to send mail cd $SLURM_SUBMIT_DIR ml Amber/18-fosscuda-2018b-AmberTools-18-patchlevel-10-8 mpiexec $AMBERHOME/bin/pmemd.cuda -O -i ./prod.in -o prod_c4-23.out -p ./dimerFBP_GOL.prmtop -c ./restart.rst -r prod.rst -x prod.mdcrd
You can use the option #SBATCH --gres=gpu:K40:1
or #SBATCH --gres=gpu:P100:1
to specify using a K40 or a P100 GPU device, respectively. The compute mode of the GPU will be set to Default.
How to submit a batch job
With the resource requirements specified in the job submission script (sub.sh), submit your job with
sbatch <scriptname>
For example
sbatch sub.sh
Once the job is submitted, the Job ID of the job (e.g. 12345) will be printed on the screen.
Discovering if a partition (queue) is busy
The nodes allocated to each partition (queue) and their state can be view with the command
sinfo
Sample output of the sinfo command:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST batch* up 08:00:00 1 drain* ra4-2 batch* up 08:00:00 3 down* d4-7,ra3-19,ra4-12 batch* up 08:00:00 1 mix b1-2 batch* up 08:00:00 1 alloc b1-3 batch* up 08:00:00 53 idle b1-[4-24],c1-3,c5-19,d4-[5-6,8-12],ra3-[1-18,20-24] gpu_p up 08:00:00 1 mix c4-23 highmem_p up 08:00:00 6 idle d4-[11-12],ra4-[21-24] inter_p up 08:00:00 2 idle ra4-[16-17]
where some common values of STATE are:
- STATE=idle indicates that those nodes are completely free.
- STATE=mix indicates that some cores on those nodes are in use (and some are free).
- STATE=alloc indicates that all cores on those nodes are in use.
- STATE=drain indicates that nodes are draining, not accepting new jobs
- STATE=down indicates that nodes are not running or accepting new jobs
This command can be used with many options. We have configured one option that shows some quantities that are commonly of interest, including node feature defined for each node. This command is
sinfo-gacrc
You can also specify the number of characters displayed in the NODELIST column (e.g. 40) and in the AVAIL_FEATURES column (e.g. 50), with
sinfo-gacrc 40 50
Sample output of the sinfo-gacrc command:
PARTITION NODELIST STATE CPUS MEMORY AVAIL_FEATURES GRES batch* ra4-2 drained* 32 126000 AMD,Opteron,QDR lscratch:230 batch* ra3-19 down* 32 126000 AMD,Opteron,QDR lscratch:230 batch* ra4-12 down* 32 126000 AMD,Opteron,QDR lscratch:230 batch* b1-3 mixed 64 126976 AMD,EPYC,Rome,EDR lscratch:890 batch* b1-2 allocated 64 126976 AMD,EPYC,Rome,EDR lscratch:890 batch* b1-[4-24] idle 64 126976 AMD,EPYC,Rome,EDR lscratch:890 batch* c1-3 idle 28 59127 Intel,Broadwell,EDR lscratch:890 batch* c5-19 idle 32 187868 Intel,Skylake,EDR lscratch:890 batch* d4-[5-6] idle 32 126976 AMD,EPYC,Naples,EDR lscratch:890 batch* d4-[8-12] idle 32 126976+ AMD,EPYC,Naples,EDR lscratch:890 batch* ra3-[1-18,20-24] idle 32 126000 AMD,Opteron,QDR lscratch:230 gpu_p c4-23 idle 32 187868 Intel,Skylake,EDR gpu:P100:1,lscratch:890 highmem_p d4-[11-12] idle 32 514048 AMD,EPYC,Naples,EDR lscratch:890 highmem_p ra4-[21-24] idle 32 126000 AMD,Opteron,QDR lscratch:230 inter_p ra4-[16-17] idle 32 126000 AMD,Opteron,QDR lscratch:230 scavenge_p rb7-18 idle 28 515780 Intel,Broadwell,QDR lscratch:180
What is the scavenge_p partition
A portion of the Sapelo2 compute nodes were purchased by UGA PIs and their group members have priority in using those resources (also referred to as buyin nodes). The GACRC purchased the rest on UGA's behalf. The agreement for the PI-owned nodes allows "other users" to also run jobs on owned nodes, as long as those jobs don't cause that lab group to wait over two hours for access to its nodes. We have implemented a partition called scavenge_p and short jobs (for example, jobs that request less than 2h) submitted to the 'batch' partition might be automatically moved into the scavenge_p partition if the 'batch' partition is busy. This is a way to reduce the wait time of the short jobs, while making use of the buyin nodes that are not in use.
Users cannot submit jobs directly to the scavenge_p partition, but if you submitted short jobs to the batch partition, you might see them running on the scavenge_p partition.
How to request a specific node feature
Each compute node has a set of features, such as shown with the sinfo-gacrc command above. Common features are Intel (if the node has Intel processors), AMD (if the node has AMD processors), EPYC (if the node has AMD EPYC processors), EDR (if the node is connected to the EDR Infiniband network), QDR (if the node is connected to the QDR Infiniband network), etc. You can request using nodes with a specific feature by adding the following header line in your job submission script:
#SBATCH --constraint=featurename
where featurename needs to be replaced by the feature you want to use. For example, to request that the job goes to a node connected to the EDR Infiniband network, use
#SBATCH --constraint=EDR
How to run Intel- or AMD-specific applications
Most of the applications that GACRC installs centrally can be run on Intel and on AMD processors, but some exceptions do exist. Also, some third-party applications that you are using might have been pre-compiled for a given processor type and would fail if run on a different processor architecture If an application that you are using if only compatible with one type of processor (e.g. Intel), you can request that node feature by adding the following line in your job submission script
#SBATCH --constraint=Intel
or
#SBATCH --constraint=Opteron
or
#SBATCH --constraint=EPYC
How to open an interactive session
An interactive session on a compute node can be started with the command
interact
This command, invoked without any arguments, will start an interactive session with one core on one of the interactive nodes, and allocate 2GB of memory for a maximum walltime of 12h. It is equivalent to the qlogin
command that we used previously, and it runs
srun --pty --cpus-per-task=1 --job-name=interact --ntasks=1 --nodes=1 --partition=inter_p --time=12:00:00 --mem=2GB /bin/bash -l
When the interact
command is run, it will echo the equivalent srun command, so you can easily check the resources associated to your interactive session.
The interact
command takes arguments that allow you to request cores, memory, walltime limit, specific node features, or a different partition and other resources.
The options that can be used with interact
are diplayed when this command is run with the -h or --help option:
[shtsai@ss-sub2 ~]$ interact -h Usage: interact [OPTIONS] Description: Start an interactive job -c, --cpus-per-task CPU cores per task (default: 1) -J, --job-name Job name (default: interact) -n, --ntasks Number of tasks (default: 1) -N, --nodes Number of nodes (default: 1) -p, --partition Partition for interactive job (default: inter_p) -q, --qos Request a quality of service for the job. -t, --time Maximum run time for interactive job (default: 12:00:00) -w, --nodelist List of node name(s) on which your job should run --constraint Job constraints --gres Generic consumable resources --mem Memory per node (default 2GB) --shell Absolute path to the shell to be used in your interactive job (default: /bin/bash) --wckey Wckey to be used with job --x11 Start an interactive job with X Forwarding -h, --help Display this help output
Examples:
To start an interactive session with 4 cores and 10GB of memory:
interact -c 4 --mem=10G
To start an interactive session with 1 core, 10GB of memory and a walltime limit of 18 hours:
interact --mem=10G --time=18:00:00
To start an interactive session with 1 core, 2GB of memory, on an AMD EPYC node in the batch partition:
interact --constraint=EPYC -p batch
To start an interactive session with 1 core, 5GB of memory, and a K40 GPU device:
interact -p gpu_p --gres=gpu:K40:1 --mem=5G
How to run an interactive job with Graphical User Interface capabilities
If you want to run an application as an interactive job and have its graphical user interface displayed on the terminal of your local machine, you need to enable X-forwarding when you ssh into the login node. This can be done in Linux by simply adding the -X option when ssh-ing into Sapelo2. For information on how to do this on windows and mac, please see questions 10 and 11 in the Frequently Asked Questions page.
Then start an interactive session, but add the option --x11 to the interact
command.
An interactive session on a compute node, with X forwarding enabled, can be started with the command
interact --x11
This command will start an interactive session, with X forwarding enabled, with one core on one of the interactive nodes, and allocate 2GB of memory for a maximum walltime of 12h.
The interact --x11
command is an alias for
srun --pty --x11 --cpus-per-task=1 --job-name=interact --ntasks=1 --nodes=1 --partition=inter_p --time=12:00:00 --mem=2GB /bin/bash -l
The options available to interact
, described in the previous section, can be used along with the --x11
option.
How to check on running or pending jobs
To list all running and pending jobs (by all users), use the command
squeue
or
squeue -l
This command can be used with many options. We have wrapper to this command, called sq
that shows some quantities that are commonly of interest. To use the sq
command to list all of your running and pending jobs, use
sq --me
For detailed information on how to monitor your jobs, please see Monitoring Jobs on Sapelo2.
How to cancel (delete) a running or pending job
To cancel one of your running or pending job, use the command
scancel <jobid>
For example, to cancel a job with Job ID 12345 use
scancel 12345
To cancel all of your jobs, use the command
scancel -u MyID
To cancel all of your pending jobs, use the command
scancel -t PENDING -u MyID
To cancel one or more jobs by job name, use the command
scancel --name <myJobName>
To cancel an element (index) of an array job
scancel <jobid>_<index>
For example, to cancel array job element 4 of an array job whose Job ID is 12345 use
scancel 12345_4
How to check resource utilization of a running or finished job
The following command can be used to show resource utilization by a running job or a job that has already completed:
sacct
This command can be used with many options. We have configured one option that shows some quantities that are commonly of interest, including the amount of memory used and the cputime used by the jobs:
sacct-gacrc
For detailed information on how to monitor your jobs, please see Monitoring Jobs on Sapelo2.