Running Jobs on pcluster

From Research Computing Center Wiki
Jump to navigation Jump to search


Using the Batch Queues

Jobs of over ten (10) minutes duration must be submitted to the queues rather than run in the background or interactively on the login node pcluster.rcc.uga.edu. Background jobs and interactive commands including cron jobs, at, $ and nohup processes as well as commands entered at the keyboard will be terminated after 10 minutes of cpu time. Graphical front ends to programs, programming tools, etc. will not be terminated.

The queueing system being used on the pcluster is the LoadLeveler. Currently, 30 "compute" nodes (named node01 to node32, with the exception of node19) are used to run jobs submitted through this queueing system.

In order to allow multiple users on the pcluster, each user may have no more than six (6) jobs in the queues discussed in the next section (running or waiting in the idle (I) status). These jobs can be either serial or parallel. Users are allowed to submit more jobs to the queues, but these will stay in a Not Queued (NQ) status. Once a running job finishes, one of the Not Queued jobs will enter the active queue and have the idle (I) status.

This queueing system uses a backfill scheduling mechanism, which means it is possible that some jobs that require less CPU time and fewer processors run before jobs that require many processors and longer CPU time.


Back to Top

Batch Queues on the IBM pcluster

The batch queue can be used for serial jobs (that is, jobs that require only one processor) and for parallel jobs. Currently the maximum number of processors a parallel job can use is 32 and only up to two such jobs will run at a given time.

Valid queue names have the following format: TN-iM-tQ-Hh

where

  • N = total number of processors required by the job (limited to a maximum of 32 processors)
  • M = number of initial processors
  • Q = number of threads generated by each initial process
  • H = maximum wallclock time (in hours) per processor

Note that the variables N, M, Q, and H have to be integers, with the restriction N = M x Q where N <=32. The maximum wallclock time per processor for a parallel job is currently limited to 24 hours (that is, H <= 24) if using more than 8 processors (that is, if 9 <=N <=32). For jobs that use from 2 to 8 processors the maximum wallclock time per processor is 96 hours (that is, H <=96). Once one of the processors hits the maximum wallclock time H,the job will be terminated by LoadLeveler. Serial jobs may request up to 10 days of wallclock time (that is, H <= 240), but only a total (not per user) of 32 long jobs (with 24 < H <= 240) will run at a time. There is no restriction on the number of serial jobs requesting H <= 24 that can run at a time (up to the total number of processors). To get a faster throughput, it is in the users' interest to checkpoint their code and run shorter jobs whenever possible. A long job that can be checkpointed can be run as a sequence of shorter jobs, which can be automatically submitted to the queue as described below in the Runchaining Jobs section. If you cannot fit your job within the established processor and cputime limits, please let us know.

Examples

1. An MPI job that uses a total of 16 processors for a maximum of 12 hours per processor should be submitted to the queue T16-i16-t1-12h

2. An OpenMP job that uses a total of 8 processors for a maximum of 6 hours per processor should be submitted to the queue T8-i1-t8-6h

3. An hybrid MPI/OpenMP job that uses MPI for communication betweentwo nodes (each with 8 processors) and uses OpenMP within a node for a maximum of 24 hours should be submitted to the queue T16-i2-t8-24h

4. A serial job should be submitted to the queue T1-i1-t1-Hh, where H is the maximum cpu time required by the job. For example, submit a serial job that runs for 24 hours to the queue T1-i1-t1-24h

To submit a job to the resource, first determine your processor number and time requirements. This will determine which queue you need.


Back to Top

Submitting a Batch Job to the Queue

The preferred way to submit a job is to use the ugsub command. The syntax for the ugsub command is:

ugsub queuename shellscriptname any_shellscript_parameters

Example of a shell script (myprog.csh)

To run a serial job:

#!/bin/csh
cd working_directory
/usr/bin/time ./myprog < $1 > $2

To run a parallel job linked with IBM MPI libraries:

#!/bin/csh
cd working_directory
poe ./myprog < $1 > $2

To run a parallel MPICH job using e.g. 4 processors (csh shell):

#!/bin/csh
cd working_directory
echo $LOADL_PROCESSOR_LIST
cat /dev/null > mlist.$$
foreach variable ($LOADL_PROCESSOR_LIST)
echo $variable >> mlist.$$
end
/usr/local/mpich/bin/mpirun -np 4 -machinefile mlist.$$ ./myprog 
rm -f mlist.$$

NOTE Do NOT put the job into the background with a '&' in the shell script. This will confuse the queueing system.

You could submit this script to the T8-i8-t1-12h queue with the following command:

ugsub T8-i8-t1-12h myprog.csh myfile1.in myfile2.out

where the arguments myfile1.in and myfile2.out (and therefore the parameters "< $1" and "> $2" following ./myprog in the script files) are optional. Note that for simplicity these parameters were omitted in the sample script above for running an MPICH job.


Back to Top

Running an Interactive Job

Interactive Serial Jobs

We have set aside one node (node19) for interactive jobs. This node is not part of the queueing system. To access this node, first login to pcluster.rcc.uga.edu and from there use ssh to connect to node19.

pcluster> ssh node19

The single processor executable (a.out) can be run as follows:

node19> ./a.out

This node should only be used for short jobs and for those that cannot be run on the batch queueing system (for example, if the job requires an X windows front-end).

Interactive Parallel Jobs

There are two ways to run interactive parallel jobs on the pcluster.

1. Using the queueing system

Interactive parallel jobs can be submitted to the LoadLeveler queueing system as well. From the login node (pcluster.rcc.uga.edu) use the command:

poe ./a.out < inputfile -procs p -nodes n -rmpool 1

where a.out is the name of the executable, inputfile is an optional file containing input parameters, p denotes the total number of processors and n denotes the number of nodes required by the job. For example, an interactive job that uses 4 processors can be run with the following command:

poe ./a.out < inputfile -procs 4 -nodes 1 -rmpool 1

The job will not run if all processors that run interactive classes are busy at the submission time. A good way to attempt to run an interactive parallel job when the machine is busy is to use the flags -retry N -retrycount M. These options specify an attempt to launch your parallel interactive job should be made M times, with wait of N seconds between launch attempts.

Running interactive parallel jobs using LoadLeveler allows you to use more than one node (that is, more than 8 processors) for each job.

2. Running on node19:

Interactive parallel jobs that use up to 8 processors can also be run on node19. To do that, you need to have a file named .rhosts in the upper level of your home directory (that is, in /home/groupname/username). This file should contain the word 'node19' (without the quotes). You should also have a file named host.list in your working directory. This file should contain the word 'node19' eight times, in a single column. Then a p-processor interactive parallel job can be run as follows:

poe ./a.out < inputfile -procs p

where p <=8. This procedure is intended for short jobs, such as those used for debugging parallel codes.


Back to Top

LoadLeveler Usage Information

These are the common LoadLeveler commands:

  • llcancel Cancel a queued or running job
  • llhold Place a queued job on hold
  • llq Check the status of queued and running jobs
  • llstatus Check the status of the pcluster
  • llqueue List examples of valid queue names

Back to Top

Checking the Status of Jobs

Use the llq command to check the status of jobs:

llq [-u username] [-l] [jobid]

where username is the user whose jobs you want to check (do not include if you want to see all jobs) and jobid is the JOBID of a specific job. The -l option gives long output, with detailed information about the job(s).

Example

  • llq shows all the jobs in the pool
  • llq -u johndoe shows all jobs for user johndoe
  • llq -l cws.10407.0 gives detailed information about the job with JOBID cws.10407.0

Back to Top

Files Created at Job Start

When your job starts, files will be created, in the directory from which the job was submitted, named as follows:

shellscriptname.error.host.jobid.processno contains the messages normally written to stderr

shellscriptname.out.host.jobid.processno contains the output written to stdout

The processno number will nearly always be 0.


Back to Top

Canceling/Removing a Job

Use the llcancel command to cancel/remove a job from the job pool:

llcancel [-u username] jobid [jobid]

Example

  • llcancel cws.10408.0 cancels your job with JOBID cws.10408.0
  • llcancel cws.10408.0 cws.10409.0 cancels your jobs with the listed JOBIDs
  • llcancel -u your_user_id cancels all jobs you have in the queue

Back to Top

Runchaining Jobs

We have found that a common need is to be able to run the same job over and over. For instance when you need to do a large number of iterations, you run so many and write in a data set the information needed to restart the job where it left off. When the job is restarted it reads the restart information and continues where the previous execution left off.

To have one job automatically submit the next one once it finishes, you can add the following lines at the end of your job submission script:

echo "ugsub queuename next_script_name" | at now
exit

You will receive an email notification when the next job is submitted. The email notification can be omitted by piping the notification message into a file in your working directory.

Example: sub1.sh

If you are using csh:

#!/bin/csh
cd /home/labname/username/suddirectory
poe ./myprogram
echo "ugsub T8-i8-t1-12h sub2.sh > messagefile" | at now
exit

If you are using ksh:

#!/bin/ksh
cd /home/labname/username/subdirectory
poe ./myprogram
echo "ugsub T8-i8-t1-12h sub2.sh > messagefile 2>&1" |at now
exit


First the script sub1.sh is submitted to the queue. Once it finishes running, it automatically submits script sub2.sh to the queue. This script can in turn submit sub3.sh to the queue when it completes, and so on. For this procedure, the user can prepare a sequence of scripts, which will then be submitted one at a time to the queue and run in sequence. Alternatively, the script sub1.sh can resubmit itself back to the queue once it finishes running. This would create an "infinite loop", a situation that is not recommended. To break the infinite loop, the user can set some termination rules for the job resubmission process. Example of a termination rule:

One way to break out of an infinite job resubmission loop is to have the code generate a file when the program finally "converges"(or when it completes a predetermined number of steps, for example). Let us call this file finalresults.txt. The job submission script sub.sh checks whether the file finalresults.txt exists. If it does not, then the script sub.sh is submitted to the queue again, otherwise the script simply exits and the resubmission chain is terminated. A simple script sub.sh that accomplishes this is the following:

In ksh:

#!/bin/ksh
cd /home/labname/username/subdirectory
poe ./myprogram
if [ ! -f finalresults.txt ]
then
echo "ugsub T8-i8-t1-12h sub.sh > messagefile" |at now
fi
exit