Running Jobs on pcluster
Using the Batch Queues
Jobs of over ten (10) minutes duration must be submitted to the queues rather than run in the background or interactively on the login node pcluster.rcc.uga.edu. Background jobs and interactive commands including cron jobs, at, $ and nohup processes as well as commands entered at the keyboard will be terminated after 10 minutes of cpu time. Graphical front ends to programs, programming tools, etc. will not be terminated.
The queueing system being used on the pcluster is the LoadLeveler. Currently, 30 "compute" nodes (named node01 to node32, with the exception of node19) are used to run jobs submitted through this queueing system.
In order to allow multiple users on the pcluster, each user may have no more than six (6) jobs in the queues discussed in the next section (running or waiting in the idle (I) status). These jobs can be either serial or parallel. Users are allowed to submit more jobs to the queues, but these will stay in a Not Queued (NQ) status. Once a running job finishes, one of the Not Queued jobs will enter the active queue and have the idle (I) status.
This queueing system uses a backfill scheduling mechanism, which means it is possible that some jobs that require less CPU time and fewer processors run before jobs that require many processors and longer CPU time.
Batch Queues on the IBM pcluster
The batch queue can be used for serial jobs (that is, jobs that require only one processor) and for parallel jobs. Currently the maximum number of processors a parallel job can use is 32 and only up to two such jobs will run at a given time.
Valid queue names have the following format: TN-iM-tQ-Hh
where
- N = total number of processors required by the job (limited to a maximum of 32 processors)
- M = number of initial processors
- Q = number of threads generated by each initial process
- H = maximum wallclock time (in hours) per processor
Note that the variables N, M, Q, and H have to be integers, with the restriction N = M x Q where N <=32. The maximum wallclock time per processor for a parallel job is currently limited to 24 hours (that is, H <= 24) if using more than 8 processors (that is, if 9 <=N <=32). For jobs that use from 2 to 8 processors the maximum wallclock time per processor is 96 hours (that is, H <=96). Once one of the processors hits the maximum wallclock time H,the job will be terminated by LoadLeveler. Serial jobs may request up to 10 days of wallclock time (that is, H <= 240), but only a total (not per user) of 32 long jobs (with 24 < H <= 240) will run at a time. There is no restriction on the number of serial jobs requesting H <= 24 that can run at a time (up to the total number of processors). To get a faster throughput, it is in the users' interest to checkpoint their code and run shorter jobs whenever possible. A long job that can be checkpointed can be run as a sequence of shorter jobs, which can be automatically submitted to the queue as described below in the Runchaining Jobs section. If you cannot fit your job within the established processor and cputime limits, please let us know.
Examples
1. An MPI job that uses a total of 16 processors for a maximum of 12 hours per processor should be submitted to the queue T16-i16-t1-12h
2. An OpenMP job that uses a total of 8 processors for a maximum of 6 hours per processor should be submitted to the queue T8-i1-t8-6h
3. An hybrid MPI/OpenMP job that uses MPI for communication betweentwo nodes (each with 8 processors) and uses OpenMP within a node for a maximum of 24 hours should be submitted to the queue T16-i2-t8-24h
4. A serial job should be submitted to the queue T1-i1-t1-Hh, where H is the maximum cpu time required by the job. For example, submit a serial job that runs for 24 hours to the queue T1-i1-t1-24h
To submit a job to the resource, first determine your processor number and time requirements. This will determine which queue you need.
Submitting a Batch Job to the Queue
The preferred way to submit a job is to use the ugsub command. The syntax for the ugsub command is:
ugsub queuename shellscriptname any_shellscript_parameters
Example of a shell script (myprog.csh)
To run a serial job:
#!/bin/csh cd working_directory /usr/bin/time ./myprog < $1 > $2
To run a parallel job linked with IBM MPI libraries:
#!/bin/csh cd working_directory poe ./myprog < $1 > $2
To run a parallel MPICH job using e.g. 4 processors (csh shell):
#!/bin/csh cd working_directory echo $LOADL_PROCESSOR_LIST cat /dev/null > mlist.$$ foreach variable ($LOADL_PROCESSOR_LIST) echo $variable >> mlist.$$ end /usr/local/mpich/bin/mpirun -np 4 -machinefile mlist.$$ ./myprog rm -f mlist.$$
NOTE Do NOT put the job into the background with a '&' in the shell script. This will confuse the queueing system.
You could submit this script to the T8-i8-t1-12h queue with the following command:
ugsub T8-i8-t1-12h myprog.csh myfile1.in myfile2.out
where the arguments myfile1.in and myfile2.out (and therefore the parameters "< $1" and "> $2" following ./myprog in the script files) are optional. Note that for simplicity these parameters were omitted in the sample script above for running an MPICH job.
Running an Interactive Job
Interactive Serial Jobs
We have set aside one node (node19) for interactive jobs. This node is not part of the queueing system. To access this node, first login to pcluster.rcc.uga.edu and from there use ssh to connect to node19.
pcluster> ssh node19
The single processor executable (a.out) can be run as follows:
node19> ./a.out
This node should only be used for short jobs and for those that cannot be run on the batch queueing system (for example, if the job requires an X windows front-end).
Interactive Parallel Jobs
There are two ways to run interactive parallel jobs on the pcluster.
1. Using the queueing system
Interactive parallel jobs can be submitted to the LoadLeveler queueing system as well. From the login node (pcluster.rcc.uga.edu) use the command:
poe ./a.out < inputfile -procs p -nodes n -rmpool 1
where a.out is the name of the executable, inputfile is an optional file containing input parameters, p denotes the total number of processors and n denotes the number of nodes required by the job. For example, an interactive job that uses 4 processors can be run with the following command:
poe ./a.out < inputfile -procs 4 -nodes 1 -rmpool 1
The job will not run if all processors that run interactive classes are busy at the submission time. A good way to attempt to run an interactive parallel job when the machine is busy is to use the flags -retry N -retrycount M. These options specify an attempt to launch your parallel interactive job should be made M times, with wait of N seconds between launch attempts.
Running interactive parallel jobs using LoadLeveler allows you to use more than one node (that is, more than 8 processors) for each job.
2. Running on node19:
Interactive parallel jobs that use up to 8 processors can also be run on node19. To do that, you need to have a file named .rhosts in the upper level of your home directory (that is, in /home/groupname/username). This file should contain the word 'node19' (without the quotes). You should also have a file named host.list in your working directory. This file should contain the word 'node19' eight times, in a single column. Then a p-processor interactive parallel job can be run as follows:
poe ./a.out < inputfile -procs p
where p <=8. This procedure is intended for short jobs, such as those used for debugging parallel codes.
LoadLeveler Usage Information
These are the common LoadLeveler commands:
- llcancel Cancel a queued or running job
- llhold Place a queued job on hold
- llq Check the status of queued and running jobs
- llstatus Check the status of the pcluster
- llqueue List examples of valid queue names
Checking the Status of Jobs
Use the llq command to check the status of jobs:
llq [-u username] [-l] [jobid]
where username is the user whose jobs you want to check (do not include if you want to see all jobs) and jobid is the JOBID of a specific job. The -l option gives long output, with detailed information about the job(s).
Example
- llq shows all the jobs in the pool
- llq -u johndoe shows all jobs for user johndoe
- llq -l cws.10407.0 gives detailed information about the job with JOBID cws.10407.0
Files Created at Job Start
When your job starts, files will be created, in the directory from which the job was submitted, named as follows:
shellscriptname.error.host.jobid.processno contains the messages normally written to stderr
shellscriptname.out.host.jobid.processno contains the output written to stdout
The processno number will nearly always be 0.
Canceling/Removing a Job
Use the llcancel command to cancel/remove a job from the job pool:
llcancel [-u username] jobid [jobid]
Example
- llcancel cws.10408.0 cancels your job with JOBID cws.10408.0
- llcancel cws.10408.0 cws.10409.0 cancels your jobs with the listed JOBIDs
- llcancel -u your_user_id cancels all jobs you have in the queue
Runchaining Jobs
We have found that a common need is to be able to run the same job over and over. For instance when you need to do a large number of iterations, you run so many and write in a data set the information needed to restart the job where it left off. When the job is restarted it reads the restart information and continues where the previous execution left off.
To have one job automatically submit the next one once it finishes, you can add the following lines at the end of your job submission script:
echo "ugsub queuename next_script_name" | at now exit
You will receive an email notification when the next job is submitted. The email notification can be omitted by piping the notification message into a file in your working directory.
Example: sub1.sh
If you are using csh:
#!/bin/csh cd /home/labname/username/suddirectory poe ./myprogram echo "ugsub T8-i8-t1-12h sub2.sh > messagefile" | at now exit
If you are using ksh:
#!/bin/ksh cd /home/labname/username/subdirectory poe ./myprogram echo "ugsub T8-i8-t1-12h sub2.sh > messagefile 2>&1" |at now exit
First the script sub1.sh is submitted to the queue. Once it finishes running, it automatically submits script sub2.sh to the queue. This script can in turn submit sub3.sh to the queue when it completes, and so on. For this procedure, the user can prepare a sequence of scripts, which will then be submitted one at a time to the queue and run in sequence. Alternatively, the script sub1.sh can resubmit itself back to the queue once it finishes running. This would create an "infinite loop", a situation that is not recommended. To break the infinite loop, the user can set some termination rules for the job resubmission process.
Example of a termination rule:
One way to break out of an infinite job resubmission loop is to have the code generate a file when the program finally "converges"(or when it completes a predetermined number of steps, for example). Let us call this file finalresults.txt. The job submission script sub.sh checks whether the file finalresults.txt exists. If it does not, then the script sub.sh is submitted to the queue again, otherwise the script simply exits and the resubmission chain is terminated. A simple script sub.sh that accomplishes this is the following:
In ksh:
#!/bin/ksh cd /home/labname/username/subdirectory poe ./myprogram if [ ! -f finalresults.txt ] then echo "ugsub T8-i8-t1-12h sub.sh > messagefile" |at now fi exit