Running Jobs on the teaching cluster: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
 
(32 intermediate revisions by one other user not shown)
Line 11: Line 11:
[[#top|Back to Top]]
[[#top|Back to Top]]


===Batch Queues on the teaching cluster===
===Queues defined on the teaching cluster===


There are different queues defined on the teaching cluster. Users are required to specify, in the job submission script or as job submission command line arguments, the queue and the resources needed by the job in order for it to be assigned to compute node(s) that have enough available resources (such as number of cores, amount of memory, GPU cards, etc).
There are different queues defined on the teaching cluster. The SLURM queueing system refers to queues as partition. Users are required to specify, in the job submission script or as job submission command line arguments, the queue and the resources needed by the job in order for it to be assigned to compute node(s) that have enough available resources (such as number of cores, amount of memory, GPU cards, etc).


The table below summarizes the queues defined and the compute nodes that they target:
The table below summarizes the partitions (queues) defined and the compute nodes that they target:
{|  width="100%" border="1"  cellspacing="0" cellpadding="2" align="center" class="wikitable unsortable"
{|  width="100%" border="1"  cellspacing="0" cellpadding="2" align="center" class="wikitable unsortable"
|-
|-
Line 26: Line 26:


|-
|-
| batch || Intel|| 37 || 12-core, 48GB RAM, Intel Xeon || Regular nodes.
| batch || Intel|| 28 || 12-core, 48GB RAM, Intel Xeon || Regular nodes.
|-
|-
| batch || Intel ||2 || 8-core, 48GB RAM, Intel Xeon || Regular nodes.
| highmem || Intel || 2 || 32-core, 512GB RAM, Intel Xeon || For high memory jobs.
|-
|-
| highmem || AMD || 2 || 48-core, 128GB RAM, AMD Opteron || For high memory jobs.
| gpu || GPU|| 1 || 12-core, 48GB RAM, Intel Xeon, 4 NVIDIA K20Xm GPUs || For GPU-enabled jobs.
|-
|-
| highmem || Intel || 3 || 8-core, 192GB RAM, Intel Xeon || For high memory jobs.
| interactive || Intel || 2 || 12-core, 48GB RAM, Intel Xeon || For interactive jobs.
|-
| gpu || GPU, M2070 || 1 || 12-core, 48GB RAM, Intel Xeon, 8 NVIDIA M2070 GPUs || For GPU-enabled jobs.
|-
| interq || AMD || 3 || 32-core, 64GB RAM, AMD Opteron || For interactive jobs.
|-
|-


|}
|}
Note that the 48GB-RAM nodes in the table above can allocate a total of 41GB of memory to jobs.


You can check all partitions (queues) defined in the cluster with the command
<pre class="gcommand">
sinfo
</pre>


----
----
Line 58: Line 60:
#SBATCH --partition=batch
#SBATCH --partition=batch
#SBATCH --job-name=test
#SBATCH --job-name=test
#SBATCH --ntasks=2
#SBATCH --ntasks=1
#SBATCH --time=2:00:00
#SBATCH --time=2:00:00
#SBATCH --mem=2gb
#SBATCH --mem=2gb
Line 70: Line 72:
* '''#SBATCH --partition=batch''' : used to specify the partition (queue) name, e.g. ''batch''
* '''#SBATCH --partition=batch''' : used to specify the partition (queue) name, e.g. ''batch''
* '''#SBATCH --job-name=test''' : used to specify the name of the job, e.g. ''test''
* '''#SBATCH --job-name=test''' : used to specify the name of the job, e.g. ''test''
* '''#SBATCH --ntasks=2''' : used to specify the number of tasks (e.g. 2).  
* '''#SBATCH --ntasks=1''' : used to specify the number of tasks (e.g. 1).  
* '''#SBATCH --time=2:00:00#''' : used to specify the maximum allowed wall clock time in dd:hh:mm:ss format for the job (e.g 2h).
* '''#SBATCH --time=2:00:00''' : used to specify the maximum allowed wall clock time in dd:hh:mm:ss format for the job (e.g 2h).
* '''#SBATCH --mem=2gb''' : used to specify the maximum memory allowed for the job (e.g. 2GB)
* '''#SBATCH --mem=2gb''' : used to specify the maximum memory allowed for the job (e.g. 2GB)


Line 105: Line 107:
====Options to manage job notification and output====
====Options to manage job notification and output====


* -J jobname
* -J, --job-name jobname
     Give the job a name. The default is the filename of the job script. Within the job, $SBATCH_JOB_NAME expands to the job name
     Give the job a name. The default is the filename of the job script. Within the job, $SBATCH_JOB_NAME expands to the job name


* -o path/for/stdout
* -o, --output=path/for/stdout
     Send stdout to path/for/stdout. The default filename is slurm-${SLURM_JOB_ID}.out, e.g. slurm-12345.out, in the directory from which the job was submitted  
     Send stdout to path/for/stdout. The default filename is slurm-${SLURM_JOB_ID}.out, e.g. slurm-12345.out, in the directory from which the job was submitted  


* -e path/for/stderr
* -e, --error=path/for/stderr
     Send stderr to path/for/stderr.
     Send stderr to path/for/stderr.


* --mail-user=yourUGAMyID@uga.edu
* --mail-user=username@uga.edu
     Send email notification to the address you specified when certain events occur.
     Send email notification to the address you specified when certain events occur.


* --mail-type=type
* --mail-type=type
     The value of''type'' can be set to NONE, BEGIN, END, FAIL, ALL.
     The value of ''type'' can be set to NONE, BEGIN, END, FAIL, ALL.
 


====Options to set Array Jobs====
====Options to set Array Jobs====
Line 148: Line 149:
You can then load the needed modules. For example, if you are running an R program, then include the line
You can then load the needed modules. For example, if you are running an R program, then include the line
<pre class="gscript">
<pre class="gscript">
ml R/3.4.4-foss-2016b-X11-20160819-GACRC
module load R/4.3.1-foss-2022a
</pre>
</pre>


Line 155: Line 156:
R CMD BATCH add.R
R CMD BATCH add.R
</pre>
</pre>


====Environment Variables exported by batch jobs====
====Environment Variables exported by batch jobs====
Line 206: Line 206:
===Sample job submission scripts===
===Sample job submission scripts===


====OpenMPI====
====Serial (single-processor) Job====


Sample job submission script (sub.sh) to run an OpenMPI application:
Sample job submission script (sub.sh) to run an R program called add.R using a single core:


<pre class="gscript">
<pre class="gscript">
#PBS -S /bin/bash
#!/bin/bash
#PBS -q batch
#SBATCH --job-name=testserial        # Job name
#PBS -N testjob
#SBATCH --partition=batch             # Partition (queue) name
#PBS -l nodes=2:ppn=48:AMD
#SBATCH --mail-type=END,FAIL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
#PBS -l walltime=48:00:00
#SBATCH --mail-user=yourMYID@uga.edu  # Where to send mail
#PBS -l mem=2gb
#SBATCH --ntasks=1                    # Run on a single CPU
#PBS -M username@uga.edu
#SBATCH --mem=1gb                    # Job memory request
#PBS -m abe
#SBATCH --time=02:00:00               # Time limit hrs:min:sec
#SBATCH --output=testserial.%j.out    # Standard output log
#SBATCH --error=testserial.%j.err    # Standard error log


cd $PBS_O_WORKDIR


ml OpenMPI/2.1.1-GCC-6.4.0-2.28
cd $SLURM_SUBMIT_DIR


echo
module load R/4.3.1-foss-2022a
echo "Job ID: $PBS_JOBID"
echo "Queue:  $PBS_QUEUE"
echo "Cores:  $PBS_NP"
echo "Nodes:  $(cat $PBS_NODEFILE | sort -u | tr '\n' ' ')"
echo "mpirun: $(which mpirun)"
echo
 
mpirun ./a.out > outputfile


R CMD BATCH add.R
</pre>
</pre>
In this sample script, the standard output and error of the job will be saved into a file called testserial.%j.out and testserial.%j.err, where %j will be automatically replaced by the job id of the job.


====MPI Job====


====OpenMP====
Sample job submission script (sub.sh) to run an OpenMPI application. In this example the job requests 16 cores and further specifies that these 16 cores need to be divided equally on 2 nodes (8 cores per node) and the binary is called mympi.exe:
 
Sample job submission script (sub.sh) to run a program that uses OpenMP with 10 threads:


<pre class="gscript">
<pre class="gscript">
#PBS -S /bin/bash
#!/bin/bash
#PBS -q batch
#SBATCH --job-name=mpitest      # Job name
#PBS -N testjob
#SBATCH --partition=batch             # Partition (queue) name
#PBS -l nodes=1:ppn=10
#SBATCH --mail-type=END,FAIL        # Mail events (NONE, BEGIN, END, FAIL, ALL)
#PBS -l walltime=48:00:00
#SBATCH --mail-user=yourMYID@uga.edu    # Where to send mail
#PBS -l mem=30gb
#SBATCH --ntasks=16                  # Number of MPI ranks
#PBS -M username@uga.edu
#SBATCH --cpus-per-task=1            # Number of cores per MPI rank
#PBS -m abe
#SBATCH --nodes=2                    # Number of nodes
 
#SBATCH --ntasks-per-node=8          # How many tasks on each node
cd $PBS_O_WORKDIR
#SBATCH --mem-per-cpu=600mb          # Memory per processor
#SBATCH --time=02:00:00             # Time limit hrs:min:sec
#SBATCH --output=mpitest.%j.out        # Standard output log
#SBATCH --error=mpitest.%j.err        # Standard error log


export OMP_NUM_THREADS=10


echo
cd $SLURM_SUBMIT_DIR
echo "Job ID: $PBS_JOBID"
echo "Queue:  $PBS_QUEUE"
echo "Cores:  $PBS_NP"
echo "Nodes:  $(cat $PBS_NODEFILE | sort -u | tr '\n' ' ')"
echo


time ./a.out > outputfile
module load OpenMPI/4.1.4-GCC-11.3.0
mpirun ./mympi.exe


</pre>
</pre>


====A high memory job====


Sample job submission script (sub.sh) to run an application that needs to use an Intel HIGHMEM node:
====OpenMP (Multi-Thread) Job====
 
Sample job submission script (sub.sh) to run a program that uses OpenMP with 6 threads. Please set '''--ntasks=1''' and set '''--cpus-per-task''' to the number of threads you wish to use. The name of the binary in this example is a.out.


<pre class="gscript">
<pre class="gscript">
#PBS -S /bin/bash
#!/bin/bash
#PBS -q highmem_q
#SBATCH --job-name=mctest      # Job name
#PBS -N testjob
#SBATCH --partition=batch            # Partition (queue) name
#PBS -l nodes=1:ppn=12:Intel
#SBATCH --mail-type=END,FAIL        # Mail events (NONE, BEGIN, END, FAIL, ALL)
#PBS -l walltime=48:00:00
#SBATCH --mail-user=yourMYID@uga.edu    # Where to send mail
#PBS -l mem=400gb
#SBATCH --ntasks=1                   # Run a single task
#PBS -M username@uga.edu
#SBATCH --cpus-per-task=6            # Number of CPU cores per task
#PBS -m abe
#SBATCH --mem=4gb                    # Job memory request
#SBATCH --time=02:00:00             # Time limit hrs:min:sec
#SBATCH --output=mctest.%j.out          # Standard output log
#SBATCH --error=mctest.%j.err          # Standard error log


cd $PBS_O_WORKDIR
cd $SLURM_SUBMIT_DIR


ml Velvet
export OMP_NUM_THREADS=6 


velvetg [options] > outputfile
module load foss/2022a  # load the appropriate module file, e.g. foss/2022a
</pre>


If the application can run either on an Intel or an AMD HIGHMEM node:
time ./a.out


<pre class="gscript">
</pre>
#PBS -S /bin/bash
#PBS -q highmem_q
#PBS -N testjob
#PBS -l nodes=1:ppn=12
#PBS -l walltime=48:00:00
#PBS -l mem=400gb
#PBS -M username@uga.edu
#PBS -m abe
 
cd $PBS_O_WORKDIR
 
ml Velvet


velvetg [options] > outputfile
</pre>


====GPU/CUDA====
====High Memory Job====


Sample job submission script (sub.sh) to run a GPU-enabled (e.g. CUDA) application:
Sample job submission script (sub.sh) to run a velvet application that needs to use 50GB of memory and 4 threads:


<pre class="gscript">
<pre class="gscript">
#PBS -S /bin/bash
#!/bin/bash
#PBS -q gpu_q
#SBATCH --job-name=highmemtest      # Job name
#PBS -N testjob
#SBATCH --partition=highmem            # Partition (queue) name
#PBS -l nodes=1:ppn=4:gpus=1
#SBATCH --mail-type=END,FAIL        # Mail events (NONE, BEGIN, END, FAIL, ALL)
#PBS -l walltime=48:00:00
#SBATCH --mail-user=yourMYID@uga.edu    # Where to send mail
#PBS -l mem=2gb
#SBATCH --ntasks=1                   # Run a single task
#PBS -M username@uga.edu
#SBATCH --cpus-per-task=4         # Number of CPU cores per task
#PBS -m abe
#SBATCH --mem=100gb                    # Job memory request
#SBATCH --time=02:00:00             # Time limit hrs:min:sec
#SBATCH --output=highmemtest.%j.out    # Standard output log
#SBATCH --error=highmemtest.%j.err    # Standard error log


cd $PBS_O_WORKDIR
cd $SLURM_SUBMIT_DIR


ml CUDA/9.0.176
export OMP_NUM_THREADS=4


echo
module load Velvet
echo "Job ID: $PBS_JOBID"
echo "Queue:  $PBS_QUEUE"
echo "Cores:  $PBS_NP"
echo "Nodes:  $(cat $PBS_NODEFILE | sort -u | tr '\n' ' ')"
echo


time ./a.out > outputfile
velvetg [options]


</pre>
</pre>


'''Note:''' Please note the additional '''gpus=1''' option in the header line. This option should be used to request the number of GPU cards to be used (e.g. to request 2 GPU cards, use gpus=2).


The GPU devices allocated to a job are listed in a file whose name is stored in the queueing system environment variable PBS_GPUFILE. You can print what this file name is with the command (add it to your job submission file):
====Hybrid MPI/shared-memory using OpenMPI====
<pre class="gscript">
echo $PBS_GPUFILE
</pre>
 
To get a list of the numbers of the GPU devices allocated to your job, separated by a blank space, use the command:
<pre class="gscript">
CUDADEV=$(cat $PBS_GPUFILE | rev | cut -d"u" -f1)
 
echo "List of devices allocated to this job:"


echo $CUDADEV
Sample job submission script (sub.sh) to run a parallel job that uses 4 MPI processes with OpenMPI and each MPI process runs with 3 threads:
</pre>


To remove the blank space between two device numbers in the CUDADEV variable above, use the command:
<pre class="gscript">
<pre class="gscript">
CUDADEV=$(cat $PBS_GPUFILE | rev | cut -d"u" -f1)
#!/bin/bash
#SBATCH --job-name=hybridtest
#SBATCH --partition=batch            # Partition (queue) name
#SBATCH --mail-type=END,FAIL  # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=yourMYID@uga.edu # Where to send mail
#SBATCH --nodes=2              # Number of nodes
#SBATCH --ntasks=4            # Number of MPI ranks
#SBATCH --ntasks-per-node=2    # Number of MPI ranks per node
#SBATCH --cpus-per-task=3      # Number of OpenMP threads for each MPI process/rank
#SBATCH --mem-per-cpu=2000mb  # Per processor memory request
#SBATCH --time=2-00:00:00      # Walltime in hh:mm:ss or d-hh:mm:ss (2 days in the example)
#SBATCH --output=hybridtest.%j.out  # Standard output log
#SBATCH --error=hybridtest.%j.err  # Standard error log
cd $SLURM_SUBMIT_DIR
ml foss/2022a


GPULIST=$(echo $CUDADEV | sed 's/ //')
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK


echo "List of devices allocated to this job (no blank spaces between devices):"
mpirun ./myhybridprog.exe


echo $GPULIST
</pre>
</pre>


Some GPU/CUDA applications require that a list of the GPU devices be given as an argument to the application. If the application needs a blank space separated device number list, use the $CUDADEV variable as an argument. If no blank space is allowed in the list, you can use the $GPULIST variable as an argument to the application.


====Hybrid MPI/shared-memory using OpenMPI====
====Array job====
 
Sample job submission script (sub.sh) to run a parallel job that uses 3 MPI processes with OpenMPI and each MPI process runs with 12 threads:


Sample job submission script (sub.sh) to submit an array job with 10 elements. In this example, each array job element will run the a.out binary using an input file called input_0, input_1, ..., input_9.
<pre class="gscript">
<pre class="gscript">
#PBS -S /bin/bash
#!/bin/bash
#PBS -j oe
#SBATCH --job-name=arrayjobtest  # Job name
#PBS -q batch
#SBATCH --partition=batch             # Partition (queue) name
#PBS -N testhybrid
#SBATCH --ntasks=1                  # Run a single task
#PBS -l nodes=3:ppn=12:AMD
#SBATCH --mem=1gb                  # Job Memory
#PBS -l mem=60g
#SBATCH --time=10:00:00             # Time limit hrs:min:sec
#PBS -l walltime=4:00:00
#SBATCH --output=array_%A-%a.out    # Standard output log
#PBS -M username@uga.edu
#SBATCH --error=array_%A-%a.err    # Standard error log
#PBS -m abe
#SBATCH --array=0-9                # Array range


ml OpenMPI/2.1.1-GCC-6.4.0-2.28
cd $SLURM_SUBMIT_DIR
 
echo
echo "Job ID: $PBS_JOBID"
echo "Queue:  $PBS_QUEUE"
echo "Cores:  $PBS_NP"
echo "Nodes:  $(cat $PBS_NODEFILE | sort -u | tr '\n' ' ')"
echo "mpirun: $(which mpirun)"
echo
 
cd $PBS_O_WORKDIR
 
export OMP_NUM_THREADS=12


perl /usr/local/bin/makehostlist.pl $PBS_NODEFILE $PBS_NUM_PPN $PBS_JOBID
module load foss/2022a # load any needed module files, e.g. foss/2022a


mpirun -machinefile  host.$PBS_JOBID.list ./a.out
time ./a.out < input_$SLURM_ARRAY_TASK_ID


</pre>
</pre>




====Running an array job====
====GPU/CUDA====


Sample job submission script (sub.sh) to submit an array job with 10 elements. In this example, each array job element will run the a.out binary using an input file called input_0, input_1, ..., input_9.
Sample script to run Amber on a GPU node using one node, 2 CPU cores, and 1 GPU card:
<pre class="gscript">
<pre class="gscript">
#PBS -S /bin/bash
#!/bin/bash
#PBS -j oe
#SBATCH --job-name=amber              # Job name
#PBS -q batch
#SBATCH --partition=gpu            # Partition (queue) name
#PBS -N myarrayjob
#SBATCH --gres=gpu:1                  # Requests one GPU device
#PBS -l nodes=1:ppn=1
#SBATCH --ntasks=1                   # Run a single task
#PBS -l walltime=4:00:00
#SBATCH --cpus-per-task=2            # Number of CPU cores per task
#PBS -t 0-9
#SBATCH --mem=40gb                    # Job memory request
#SBATCH --time=10:00:00               # Time limit hrs:min:sec
#SBATCH --output=amber.%j.out        # Standard output log
#SBATCH --error=amber.%j.err          # Standard error log
 
#SBATCH --mail-type=END,FAIL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=yourMYID@uga.edu      # Where to send mail


cd $PBS_O_WORKDIR
cd $SLURM_SUBMIT_DIR


time ./a.out < input_$PBS_ARRAYID
ml Amber/22.0-foss-2021b-AmberTools-22.3-CUDA-11.4.1


srun $AMBERHOME/bin/pmemd.cuda -O -i ./prod.in -o prod.out  -p ./dimerFBP_GOL.prmtop -c ./restart.rst -r prod.rst -x prod.mdcrd
</pre>
</pre>


Line 425: Line 400:
With the resource requirements specified in the job submission script (sub.sh), submit your job with
With the resource requirements specified in the job submission script (sub.sh), submit your job with
<pre class="gcommand">
<pre class="gcommand">
qsub scriptname
sbatch <scriptname>
</pre>
</pre>
For example
For example
<pre class="gcommand">
<pre class="gcommand">
qsub sub.sh
sbatch sub.sh
</pre>
</pre>
Once the job is submitted, the Job ID of the job (e.g. 123456.pbs.scm) will be printed on the screen.
Once the job is submitted, the Job ID of the job (e.g. 12345) will be printed on the screen.
 


----
----
[[#top|Back to Top]]
[[#top|Back to Top]]


===Discovering if the queue is busy===
===Discovering if a partition (queue) is busy===
To check if the queue is busy or which node is free to accept the job, following command can helps:
The nodes allocated to each partition (queue) and their state can be view with the command
<pre class="gscript">
<pre class="gcommand">
mdiag -n -v
sinfo
</pre>
</pre>


For example, check if the highmem_q is busy:
Sample output of the '''sinfo''' command:


<pre class="gscript">
<pre class="gcommand">
mdiag -n -v | grep highmem_q
PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST
highmem        up 7-00:00:00      2  idle rb1-[1-2]
interactive    up 7-00:00:00      2  idle rb1-[11-12]
fsr4601        up      1:00      8  idle rb1-[3-10]
fsr8602        up      10:00      8  idle rb1-[3-10]
batch          up 2-00:00:00      3    mix rb1-3,rb1-[6-8]
batch          up 2-00:00:00      1  alloc rb1-4
batch          up 2-00:00:00    36  idle rb1-[5,9-10]
</pre>
</pre>
where some common values of STATE are:
*STATE=idle indicates that those nodes are completely free.
*STATE=mix indicates that some cores on those nodes are in use (and some are free).
*STATE=alloc indicates that all cores on those nodes are in use.


----
----
Line 456: Line 441:
An interactive session on a compute node can be started with the command
An interactive session on a compute node can be started with the command
<pre class="gcommand">
<pre class="gcommand">
qlogin
interact
</pre>
</pre>
This command will start an interactive session with one core on a node with feature inter, using the s_interq queue,  
This command will start an interactive session with one core on one of the interactive nodes, and allocate 2GB of memory for a maximum walltime of 12h.
and with a walltime limit of 12h. The interactive session will open on either an AMD node or an Intel node.


The '''qlogin''' command is an alias for  
The '''interact''' command is an alias for  
<pre class="gcommand">
<pre class="gcommand">
qsub -I -q s_interq -l walltime=12:00:00 -l nodes=1:ppn=1 -l mem=2gb
srun --pty  --cpus-per-task=1 --job-name=interact --ntasks=1 --nodes=1 --partition=interactive --time=12:00:00 --mem=2GB /bin/bash -l
</pre>
</pre>
so it can be used to start an interactive session on a node with feature inter
and with a walltime of 12h.


If you wish to start an interactive session on an AMD node, you can use the command
The options that can be used with <code>interact</code> are diplayed when this command is run with the -h or --help option:
<pre class="gcommand">
 
qlogin_amd
<pre class="gcomment">
</pre>
[shtsai@teach1 ~]$ interact -h


If you wish to start an interactive session on an Intel node, you can use the command
Usage: interact [OPTIONS]
<pre class="gcommand">
qlogin_intel
</pre>


Description: Start an interactive job


If you would like to start an interactive session with a different walltime limit or with more cores (e.g. to test a small parallel job),
    -c, --cpus-per-task        CPU cores per task (default: 1)
please use the command below and select appropriate values for the walltime and the ppn value. For example, this command:
    -J, --job-name              Job name (default: interact)
<pre class="gcommand">
    -n, --ntasks                Number of tasks (default: 1)
qsub -I -q s_interq -l walltime=02:00:00 -l nodes=1:ppn=4:inter -l mem=2gb
    -N, --nodes            Number of nodes (default: 1)
    -p, --partition            Partition for interactive job (default: inter_p)
    -q, --qos              Request a quality of service for the job.
    -t, --time              Maximum run time for interactive job (default: 12:00:00)
    -w, --nodelist              List of node name(s) on which your job should run
    --constraint                Job constraints
    --gres                  Generic consumable resources
    --mem                  Memory per node (default 2GB)
    --shell                Absolute path to the shell to be used in your interactive job (default: /bin/bash)
    --wckey                Wckey to be used with job
    --x11                  Start an interactive job with X Forwarding
    -h, --help              Display this help output
</pre>
</pre>
will start an interactive session with 4 cores, a walltime limit of 2h (choose appropriately), using the s_interq queue, and on
a node with feature inter.


To start an interactive on your lab's buyin node, please select the queue that targets your lab's node(s). For example:  
'''Examples:'''
 
To start an interactive session with 4 cores and 10GB of memory:
<pre class="gcommand">
<pre class="gcommand">
qsub -I -l walltime=12:00:00 -l nodes=1:ppn=1 -l mem=2gb -q abclab_q
interact -c 4 --mem=10G
</pre>
</pre>


A typical use of an interactive session is for code compilation, so the
binaries generated are optimized for the compute node type (e.g. inter which is
identical to nodes with the AMD feature).


----
----
Line 500: Line 488:


===How to run an interactive job with Graphical User Interface capabilities===
===How to run an interactive job with Graphical User Interface capabilities===
If you want to run an application as an interactive job and have its graphical
user interface displayed on the terminal of your local machine, you need to
enable X-forwarding when you ssh into the login node. For information on how
to do this, please see questions 5.4 and 5.5 in the [[Frequently Asked Questions]]
page.
On the teaching cluster, X-forwarding does not work from any of the compute nodes,
including the interactive nodes. Please feel free to run X windows applications
directly on the teaching cluster login node.


<!--
<!--
'''NOTE: X-forwarding is not working on Sapelo2 yet, sorry for the inconvenience'''
'''NOTE: X-forwarding is not working on Sapelo2 yet, sorry for the inconvenience'''
-->




Line 530: Line 527:
desktop).
desktop).


 
-->
----
----
[[#top|Back to Top]]
[[#top|Back to Top]]


<!--
===How to run a singularity application===
===How to run a singularity application===


Line 669: Line 667:


In the example above, the total amount of /lscratch allocated to this job is 5000000 * 4 = 20000000 = 20GB.
In the example above, the total amount of /lscratch allocated to this job is 5000000 * 4 = 20000000 = 20GB.


----
----
[[#top|Back to Top]]
[[#top|Back to Top]]
-->


===How to check on running or pending jobs===
===How to check on running or pending jobs===
Line 677: Line 678:
To list all running and pending jobs (by all users), use the command
To list all running and pending jobs (by all users), use the command
<pre class="gcommand">
<pre class="gcommand">
qstat
squeue
</pre>
</pre>
 
or
To list all your running and pending jobs, use the command
<pre class="gcommand">
<pre class="gcommand">
qstat_me
squeue -l
</pre>
</pre>


or
<pre class="gcommand">
qstat -u MyID
</pre>
where MyID needs to be replaced by your UGA MyID.
To list all array elements of array jobs, add the '''-t''' option to qstat:
<pre class="gcommand">
qstat -u MyID -t
</pre>


For detailed information on how to monitor your jobs, please see [[Monitoring Jobs on Sapelo2]].
For detailed information on how to monitor your jobs, please see [[Monitoring Jobs on the teaching cluster]].


----
----
Line 706: Line 695:
To delete one of your running or pending job, use the command
To delete one of your running or pending job, use the command
<pre class="gcommand">
<pre class="gcommand">
qdel jobid
scancel <jobid>
</pre>
</pre>
For example, to delete a job with Job ID 123456.pbs.scm use
For example, to delete a job with Job ID 12345 use
<pre class="gcommand">
<pre class="gcommand">
qdel 123456
scancel 12345
</pre>
</pre>


===Standard error and standard output files of a job===
----
[[#top|Back to Top]]
 
===How to check resource utilization of a running or finished job===


By default, the standard output and the standard error of the job will be written into files called '''jobname.oJobid''' and '''jobname.eJobid''', respectively, where '''jobname''' is the name of the job and '''Jobid''' is the job id number. If you want the standard error to be written into the standard output file, please add the header line
The following command can be used to show resource utilization by a running job or a job that has already completed:
<pre class="gscript">
<pre class="gcommand">
#PBS -j oe
sacct
</pre>
</pre>


These files will be written to disk (to your working directory) while the job is running. However, we still encourage users to write all standard output of the application into a separate file. If the application writes to the standard output, you could redirect the stdout and stderr of an application to a file, e.g., output.txt:
This command can be used with many options. We have configured one option that shows some quantities that are commonly of interest, including the amount of memory used and the cputime used by the jobs:
<pre class="gscript">
<pre class="gcommand">
./application >output.txt 2>&1
sacct-gacrc
</pre>
</pre>
where the name ''output.txt'' can be replaced by a file name of choice.
----
[[#top|Back to Top]]
===How to check resource utilization of a finished job===


For detailed information on how to monitor your jobs, please see [[Monitoring Jobs on the teaching cluster]].
<!--
'''1.''' You can request than an email be sent to you when the job finishes, by adding these two header lines to the job submission script:
'''1.''' You can request than an email be sent to you when the job finishes, by adding these two header lines to the job submission script:
<pre class="gscript">
<pre class="gscript">
#PBS -M username@uga.edu  
#SBATCH --mail-type=END,FAIL 
#PBS -m ae
#SBATCH --mail-user=username@uga.edu
</pre>
</pre>
where ''username@uga.edu'' should be replaced by your email address (not necessarily a UGAMail address).  
where ''username@uga.edu'' should be replaced by your email address (not necessarily a UGAMail address).  
Line 747: Line 734:
to check on the resource utilization (such as wall clock time, amount of memory, etc).
to check on the resource utilization (such as wall clock time, amount of memory, etc).


<!--
 
'''3.''' Jobs that completed over one hour ago, but no longer than 7 days ago, you can use the command  
'''3.''' Jobs that completed over one hour ago, but no longer than 7 days ago, you can use the command  
<pre class="gcommand">
<pre class="gcommand">

Latest revision as of 10:14, 18 January 2024


Using the Queueing System

The login node for the teaching cluster should be used for text editing, and job submissions. No jobs should be run directly on the login node. Processes that use too much CPU or RAM on the login node may be terminated by GACRC staff, or automatically, in order to keep the cluster running properly. Jobs should be run using the Slurm queueing system. The queueing system should be used to run both interactive and batch jobs.


Back to Top

Queues defined on the teaching cluster

There are different queues defined on the teaching cluster. The SLURM queueing system refers to queues as partition. Users are required to specify, in the job submission script or as job submission command line arguments, the queue and the resources needed by the job in order for it to be assigned to compute node(s) that have enough available resources (such as number of cores, amount of memory, GPU cards, etc).

The table below summarizes the partitions (queues) defined and the compute nodes that they target:

Queue Name Node Type Node Number Description Notes
batch Intel 28 12-core, 48GB RAM, Intel Xeon Regular nodes.
highmem Intel 2 32-core, 512GB RAM, Intel Xeon For high memory jobs.
gpu GPU 1 12-core, 48GB RAM, Intel Xeon, 4 NVIDIA K20Xm GPUs For GPU-enabled jobs.
interactive Intel 2 12-core, 48GB RAM, Intel Xeon For interactive jobs.

Note that the 48GB-RAM nodes in the table above can allocate a total of 41GB of memory to jobs.


You can check all partitions (queues) defined in the cluster with the command

sinfo

Back to Top

Job submission Scripts

Users are required to specify the number of cores, the amount of memory, the queue name, and the maximum wallclock time needed by the job.

Header lines

Basic job submission script

At a minimum, the job submission script needs to have the following header lines:

#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=test
#SBATCH --ntasks=1
#SBATCH --time=2:00:00
#SBATCH --mem=2gb

Commands to run your application should be added after these header lines.

Header lines explained

  • #!/bin/bash : used to specify using /bin/bash shell
  • #SBATCH --partition=batch : used to specify the partition (queue) name, e.g. batch
  • #SBATCH --job-name=test : used to specify the name of the job, e.g. test
  • #SBATCH --ntasks=1 : used to specify the number of tasks (e.g. 1).
  • #SBATCH --time=2:00:00 : used to specify the maximum allowed wall clock time in dd:hh:mm:ss format for the job (e.g 2h).
  • #SBATCH --mem=2gb : used to specify the maximum memory allowed for the job (e.g. 2GB)


Below are some of the most commonly used queueing system options to configure the job.

Options to request resources for the job

  • -t, --time=time
   Set a limit on the total run time. Acceptable formats include  "minutes", "minutes:seconds",  "hours:minutes:seconds",  "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds"
  • --mem=MB
   Maximum memory per node the job will need in MegaBytes
  • --mem-per-cpu=MB
   Memory required per allocated CPU in MegaBytes
  • -N, --nodes=num
   Number of nodes are required. Default is 1 node
  • -n, --ntasks=num
   Maximum number tasks will be launched. Default is one task per node
  • --ntasks-per-node=ntasks
   Request that ntasks be invoked on each node
  • -c, --cpus-per-task=ncpus
   Require ncpus number of CPU cores per task. Without this option, allocate one core per task

Please try to request resources for your job as accurately as possible, because this allows your job to be dispatched to run at the earliest opportunity and it helps the system allocate resources efficiently to start as many jobs as possible, benefiting all users.


Options to manage job notification and output

  • -J, --job-name jobname
   Give the job a name. The default is the filename of the job script. Within the job, $SBATCH_JOB_NAME expands to the job name
  • -o, --output=path/for/stdout
   Send stdout to path/for/stdout. The default filename is slurm-${SLURM_JOB_ID}.out, e.g. slurm-12345.out, in the directory from which the job was submitted 
  • -e, --error=path/for/stderr
   Send stderr to path/for/stderr.
  • --mail-user=username@uga.edu
   Send email notification to the address you specified when certain events occur.
  • --mail-type=type
   The value of type can be set to NONE, BEGIN, END, FAIL, ALL.

Options to set Array Jobs

If you wish to run an application binary or script using e.g. different input files, then you might find it convenient to use an array job. To create an array job with e.g. 10 elements, use

#SBATCH -t 0-9

or

#SBATCH --array=0-9

The ID of each element in an array job is stored in the variable SLURM_ARRAY_TASK_ID. The variable SLURM_ARRAY_JOB_ID will be expanded into the jobid of the array job. Each array job element runs as an independent job, so multiple array elements can run concurrently, if resources are available.

Option to set job dependency

You can set job dependency with the option -d or --dependency=dependency-list. For example, if you want to specify that one job only starts after job with jobid 1234 finishes, you can add the following header line in the job submission script of the job:

#SBATCH --dependency=afterok:1234

Having this header line in the job submission script will ensure that the job is only dispatched to run after job 1234 has completed successfully.

Other content of the script

Following the header lines, users can include commands to change to the working directory, to load the modules needed to run the application, and to invoke the application. For example, to use the directory from which the job is submitted as the working directory (where to find input files or binaries), add the line

cd $SLURM_SUBMIT_DIR

You can then load the needed modules. For example, if you are running an R program, then include the line

module load R/4.3.1-foss-2022a

Then invoke your application. For example, if you are running an R program called add.R which is in your job submission directory, use

R CMD BATCH add.R

Environment Variables exported by batch jobs

When a batch job is started, a number of variables are introduced into the job's environment that can be used by the batch script in making decisions, creating output files, and so forth. Some of these variables are listed in the following table:

Variable Description
SLURM_ARRAY_JOB_ID Job id of an array job
SLURM_ARRAY_TASK_ID Value of job array index for this job
SLURM_CPUS_ON_NODE Number of CPUS on the allocated node.
SLURM_CPUS_PER_TASK Number of cpus requested per task. Only set if the --cpus-per-task option is specified.
SLURM_JOB_ID Unique pbs job id
SLURM_JOB_NAME User specified jobname
SLURM_JOB_CPUS_PER_NODE Count of processors available to the job on this node.
SLURM_JOB_NAME Name of the job.
SLURM_JOB_NODELIST List of nodes allocated to the job.
SLURM_JOB_NUM_NODES Total number of nodes in the job's resource allocation.
SLURM_JOB_PARTITION Name of the partition (i.e. queue) in which the job is running.
SLURM_NTASKS Same as -n, --ntasks
SLURM_NTASKS_PER_NODE Number of tasks requested per node. Only set if the --ntasks-per-node option is specified.
SLURM_SUBMIT_DIR The directory from which sbatch was invoked.
SLURM_TASKS_PER_NODE Number of tasks to be initiated on each node.



Back to Top

Sample job submission scripts

Serial (single-processor) Job

Sample job submission script (sub.sh) to run an R program called add.R using a single core:

#!/bin/bash
#SBATCH --job-name=testserial         # Job name
#SBATCH --partition=batch             # Partition (queue) name
#SBATCH --mail-type=END,FAIL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=yourMYID@uga.edu  # Where to send mail	
#SBATCH --ntasks=1                    # Run on a single CPU
#SBATCH --mem=1gb                     # Job memory request
#SBATCH --time=02:00:00               # Time limit hrs:min:sec
#SBATCH --output=testserial.%j.out    # Standard output log
#SBATCH --error=testserial.%j.err    # Standard error log


cd $SLURM_SUBMIT_DIR

module load R/4.3.1-foss-2022a

R CMD BATCH add.R

In this sample script, the standard output and error of the job will be saved into a file called testserial.%j.out and testserial.%j.err, where %j will be automatically replaced by the job id of the job.

MPI Job

Sample job submission script (sub.sh) to run an OpenMPI application. In this example the job requests 16 cores and further specifies that these 16 cores need to be divided equally on 2 nodes (8 cores per node) and the binary is called mympi.exe:

#!/bin/bash
#SBATCH --job-name=mpitest      # Job name
#SBATCH --partition=batch             # Partition (queue) name
#SBATCH --mail-type=END,FAIL         # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=yourMYID@uga.edu    # Where to send mail	
#SBATCH --ntasks=16                  # Number of MPI ranks
#SBATCH --cpus-per-task=1            # Number of cores per MPI rank 
#SBATCH --nodes=2                    # Number of nodes
#SBATCH --ntasks-per-node=8          # How many tasks on each node
#SBATCH --mem-per-cpu=600mb          # Memory per processor
#SBATCH --time=02:00:00              # Time limit hrs:min:sec
#SBATCH --output=mpitest.%j.out         # Standard output log
#SBATCH --error=mpitest.%j.err         # Standard error log


cd $SLURM_SUBMIT_DIR

module load OpenMPI/4.1.4-GCC-11.3.0
mpirun ./mympi.exe


OpenMP (Multi-Thread) Job

Sample job submission script (sub.sh) to run a program that uses OpenMP with 6 threads. Please set --ntasks=1 and set --cpus-per-task to the number of threads you wish to use. The name of the binary in this example is a.out.

#!/bin/bash
#SBATCH --job-name=mctest      # Job name
#SBATCH --partition=batch             # Partition (queue) name
#SBATCH --mail-type=END,FAIL         # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=yourMYID@uga.edu    # Where to send mail	
#SBATCH --ntasks=1                   # Run a single task	
#SBATCH --cpus-per-task=6            # Number of CPU cores per task
#SBATCH --mem=4gb                    # Job memory request
#SBATCH --time=02:00:00              # Time limit hrs:min:sec
#SBATCH --output=mctest.%j.out          # Standard output log
#SBATCH --error=mctest.%j.err          # Standard error log

cd $SLURM_SUBMIT_DIR

export OMP_NUM_THREADS=6  

module load foss/2022a  # load the appropriate module file, e.g. foss/2022a

time ./a.out


High Memory Job

Sample job submission script (sub.sh) to run a velvet application that needs to use 50GB of memory and 4 threads:

#!/bin/bash
#SBATCH --job-name=highmemtest      # Job name
#SBATCH --partition=highmem            # Partition (queue) name
#SBATCH --mail-type=END,FAIL         # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=yourMYID@uga.edu    # Where to send mail	
#SBATCH --ntasks=1                   # Run a single task	
#SBATCH --cpus-per-task=4          # Number of CPU cores per task
#SBATCH --mem=100gb                    # Job memory request
#SBATCH --time=02:00:00              # Time limit hrs:min:sec
#SBATCH --output=highmemtest.%j.out     # Standard output log
#SBATCH --error=highmemtest.%j.err     # Standard error log

cd $SLURM_SUBMIT_DIR

export OMP_NUM_THREADS=4

module load Velvet

velvetg [options]


Hybrid MPI/shared-memory using OpenMPI

Sample job submission script (sub.sh) to run a parallel job that uses 4 MPI processes with OpenMPI and each MPI process runs with 3 threads:

#!/bin/bash
#SBATCH --job-name=hybridtest
#SBATCH --partition=batch             # Partition (queue) name
#SBATCH --mail-type=END,FAIL   # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=yourMYID@uga.edu # Where to send mail	
#SBATCH --nodes=2              # Number of nodes
#SBATCH --ntasks=4             # Number of MPI ranks
#SBATCH --ntasks-per-node=2    # Number of MPI ranks per node
#SBATCH --cpus-per-task=3      # Number of OpenMP threads for each MPI process/rank
#SBATCH --mem-per-cpu=2000mb   # Per processor memory request
#SBATCH --time=2-00:00:00      # Walltime in hh:mm:ss or d-hh:mm:ss (2 days in the example)
#SBATCH --output=hybridtest.%j.out  # Standard output log
#SBATCH --error=hybridtest.%j.err   # Standard error log
 
cd $SLURM_SUBMIT_DIR
 
ml foss/2022a

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

mpirun ./myhybridprog.exe


Array job

Sample job submission script (sub.sh) to submit an array job with 10 elements. In this example, each array job element will run the a.out binary using an input file called input_0, input_1, ..., input_9.

#!/bin/bash
#SBATCH --job-name=arrayjobtest   # Job name
#SBATCH --partition=batch             # Partition (queue) name
#SBATCH --ntasks=1                  # Run a single task
#SBATCH --mem=1gb                   # Job Memory
#SBATCH --time=10:00:00             # Time limit hrs:min:sec
#SBATCH --output=array_%A-%a.out    # Standard output log
#SBATCH --error=array_%A-%a.err    # Standard error log
#SBATCH --array=0-9                 # Array range

cd $SLURM_SUBMIT_DIR

module load foss/2022a # load any needed module files, e.g. foss/2022a

time ./a.out < input_$SLURM_ARRAY_TASK_ID


GPU/CUDA

Sample script to run Amber on a GPU node using one node, 2 CPU cores, and 1 GPU card:

#!/bin/bash
#SBATCH --job-name=amber              # Job name
#SBATCH --partition=gpu             # Partition (queue) name
#SBATCH --gres=gpu:1                  # Requests one GPU device 
#SBATCH --ntasks=1                    # Run a single task	
#SBATCH --cpus-per-task=2             # Number of CPU cores per task
#SBATCH --mem=40gb                    # Job memory request
#SBATCH --time=10:00:00               # Time limit hrs:min:sec
#SBATCH --output=amber.%j.out         # Standard output log
#SBATCH --error=amber.%j.err          # Standard error log

#SBATCH --mail-type=END,FAIL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=yourMYID@uga.edu      # Where to send mail 

cd $SLURM_SUBMIT_DIR

ml Amber/22.0-foss-2021b-AmberTools-22.3-CUDA-11.4.1

srun $AMBERHOME/bin/pmemd.cuda -O -i ./prod.in -o prod.out  -p ./dimerFBP_GOL.prmtop -c ./restart.rst -r prod.rst -x prod.mdcrd

Back to Top

How to submit a job to the batch queue

With the resource requirements specified in the job submission script (sub.sh), submit your job with

sbatch <scriptname>

For example

sbatch sub.sh

Once the job is submitted, the Job ID of the job (e.g. 12345) will be printed on the screen.


Back to Top

Discovering if a partition (queue) is busy

The nodes allocated to each partition (queue) and their state can be view with the command

sinfo

Sample output of the sinfo command:

PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
highmem        up 7-00:00:00      2   idle rb1-[1-2]
interactive    up 7-00:00:00      2   idle rb1-[11-12]
fsr4601        up       1:00      8   idle rb1-[3-10]
fsr8602        up      10:00      8   idle rb1-[3-10]
batch          up 2-00:00:00      3    mix rb1-3,rb1-[6-8]
batch          up 2-00:00:00      1  alloc rb1-4
batch          up 2-00:00:00     36   idle rb1-[5,9-10]

where some common values of STATE are:

  • STATE=idle indicates that those nodes are completely free.
  • STATE=mix indicates that some cores on those nodes are in use (and some are free).
  • STATE=alloc indicates that all cores on those nodes are in use.

Back to Top

How to open an interactive session

An interactive session on a compute node can be started with the command

interact

This command will start an interactive session with one core on one of the interactive nodes, and allocate 2GB of memory for a maximum walltime of 12h.

The interact command is an alias for

srun --pty  --cpus-per-task=1 --job-name=interact --ntasks=1 --nodes=1 --partition=interactive --time=12:00:00 --mem=2GB /bin/bash -l

The options that can be used with interact are diplayed when this command is run with the -h or --help option:

[shtsai@teach1 ~]$ interact -h

Usage: interact [OPTIONS]

Description: Start an interactive job

    -c, --cpus-per-task         CPU cores per task (default: 1)
    -J, --job-name              Job name (default: interact)
    -n, --ntasks                Number of tasks (default: 1)
    -N, --nodes             	Number of nodes (default: 1)
    -p, --partition             Partition for interactive job (default: inter_p)
    -q, --qos               	Request a quality of service for the job.
    -t, --time              	Maximum run time for interactive job (default: 12:00:00)
    -w, --nodelist              List of node name(s) on which your job should run
    --constraint                Job constraints
    --gres                  	Generic consumable resources
    --mem                  	Memory per node (default 2GB)
    --shell                 	Absolute path to the shell to be used in your interactive job (default: /bin/bash)
    --wckey                 	Wckey to be used with job
    --x11                   	Start an interactive job with X Forwarding
    -h, --help              	Display this help output

Examples:

To start an interactive session with 4 cores and 10GB of memory:

interact -c 4 --mem=10G



Back to Top

How to run an interactive job with Graphical User Interface capabilities

If you want to run an application as an interactive job and have its graphical user interface displayed on the terminal of your local machine, you need to enable X-forwarding when you ssh into the login node. For information on how to do this, please see questions 5.4 and 5.5 in the Frequently Asked Questions page.

On the teaching cluster, X-forwarding does not work from any of the compute nodes, including the interactive nodes. Please feel free to run X windows applications directly on the teaching cluster login node.


Back to Top


How to check on running or pending jobs

To list all running and pending jobs (by all users), use the command

squeue

or

squeue -l


For detailed information on how to monitor your jobs, please see Monitoring Jobs on the teaching cluster.


Back to Top

How to delete a running or pending job

To delete one of your running or pending job, use the command

scancel <jobid>

For example, to delete a job with Job ID 12345 use

scancel 12345

Back to Top

How to check resource utilization of a running or finished job

The following command can be used to show resource utilization by a running job or a job that has already completed:

sacct

This command can be used with many options. We have configured one option that shows some quantities that are commonly of interest, including the amount of memory used and the cputime used by the jobs:

sacct-gacrc

For detailed information on how to monitor your jobs, please see Monitoring Jobs on the teaching cluster.


Back to Top