Tmp: Difference between revisions
No edit summary |
No edit summary |
||
Line 2: | Line 2: | ||
= Pending or Running Jobs = | = Pending or Running Jobs = | ||
==<code>squeue</code> and <code>sq</code>== | |||
The easiest way to monitor pending or running jobs is with the Slurm <code>squeue</code> command. Like most Slurm commands, you are able to control the columns displayed in the output of this command (see <code>man squeue</code> for more information). To save you that trouble and to make things more convenient, we've created the <code>sq</code> command, which is <code>squeue</code> but pre-formatted and with some additional options for convenience. | The easiest way to monitor pending or running jobs is with the Slurm <code>squeue</code> command. Like most Slurm commands, you are able to control the columns displayed in the output of this command (see <code>man squeue</code> for more information). To save you that trouble and to make things more convenient, we've created the <code>sq</code> command, which is <code>squeue</code> but pre-formatted and with some additional options for convenience. | ||
Line 48: | Line 50: | ||
Descriptions: sq - preformatted wrapper for squeue. See man squeue for more information. | Descriptions: sq - preformatted wrapper for squeue. See man squeue for more information. | ||
-j Displays squeue output for a given job | |||
--me Displays squeue output for the user executing this command | --me Displays squeue output for the user executing this command | ||
-p Displays squeue output for a given partition | -p Displays squeue output for a given partition | ||
Line 82: | Line 85: | ||
---- | ---- | ||
[[#top|Back to Top]] | [[#top|Back to Top]] | ||
== <code>scontrol show job</code> == | |||
insert info/examples here | |||
= Previously Ran Jobs = | = Previously Ran Jobs = |
Revision as of 14:43, 16 September 2021
Pending or Running Jobs
squeue
and sq
The easiest way to monitor pending or running jobs is with the Slurm squeue
command. Like most Slurm commands, you are able to control the columns displayed in the output of this command (see man squeue
for more information). To save you that trouble and to make things more convenient, we've created the sq
command, which is squeue
but pre-formatted and with some additional options for convenience.
The key thing to remember about squeue
/sq
is that without any options, it shows ALL currently running and pending jobs on the cluster. In order to show only your currently running and pending jobs, you will want to use the --me
option.
The default squeue
columns are as follows:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
Using sq
runs the squeue
command but provides the following columns:
JOBID TIME TIME_LIMIT NAME PARTITION USER NODES CPUS MIN_MEMORY PRIORITY STATE NODELIST(REASON)
As you can see, you're able to get much more useful information with sq
than with just the default squeue
formatting.
Output Columns Explained
- JOBID: The unique ID of the job.
- TIME: How much (wall) time has elapsed since the job started, in the format DAYS-HOURS:MINUTES:SECONDS
- TIME_LIMIT: The maximum time given for the job to run, in the format DAYS-HOURS:MINUTES:SECONDS.
- NAME: The name of the job. If not specified in one's submission script, it will default to the name of the submission script (e.g. "sub.sh").
- PARTITION: The partition to which the job was sent (e.g. batch, highmem_p, gpu_p, etc...).
- USER: The user who submitted the job.
- NODES: The number of nodes allocated to the job.
- CPUS: The number of CPU cores allocated to the job.
- MIN_MEMORY: The amount of memory allocated to the job.
- PRIORITY: The job's priority per Slurm's Multifactor Priority Plugin
- STATE: The job's state (e.g. Running, Pending, etc...)
- NODELIST(REASON): The name of the node(s) on which the job is running or the reason the job has not started yet, if it is pending.
sq
also has a -h/--help option:
bc06026@ss-sub3 ~$ sq --help Usage: sq [OPTIONS] Descriptions: sq - preformatted wrapper for squeue. See man squeue for more information. -j Displays squeue output for a given job --me Displays squeue output for the user executing this command -p Displays squeue output for a given partition -u Displays squeue output for a given user -T Displays submit and start time columns -h, --help Displays this help output
Examples
- See all pending and running jobs:
sq
- See all of your pending and running jobs:
sq --me
- See all pending and running jobs in the highmem_p:
sq -p highmem_p
- See all of your pending and running jobs in the batch partition:
sq --me -p batch
- See all of your pending and running jobs including submit time and start time columns:
sq --me -T
(Note, this will require a wide monitor or small font to display without columns wrapping around)
Example sq
output:
bc06026@ss-sub3 ~$ sq JOBID TIME TIME_LIMIT NAME PARTITION USER NODES CPUS MIN_MEMORY PRIORITY STATE NODELIST(REASON) 4581410 2:10:56 10:00:00 Bowtie2-test batch zp21982 1 1 12G 6003 RUNNING c5-4 4584815 1:51:03 2:00:00 test-job highmem_p rt12352 1 12 300G 5473 RUNNING d3-9 4578428 4:57:15 1-2:00:00 PR6_Cd3 batch un12354 1 1 40G 5449 RUNNING c4-16 4583491 1:57:38 12:00:00 interact inter_p ai38821 1 4 2G 5428 RUNNING d5-21 4580374 2:54:41 12:00:00 BLAST batch gh98762 1 1 10G 5397 RUNNING b1-9 ...
scontrol show job
insert info/examples here
Previously Ran Jobs
The easiest way to monitor previously ran jobs is with the Slurm sacct
command. Like most Slurm commands, you are able to control the columns displayed in the output of this command (see man sacct
for more information). To save you that trouble and to make things more convenient, we've created the sacct-gacrc
command, which is sacct
but pre-formatted and with some additional options for convenience.
A big difference between squeue
/sq
and sacct
/sacct-gacrc
is that by default, sacct
/sacct-gacrc
without any options only shows you YOUR Jobs. Another important note about sacct
/sacct-gacrc
is that by default it will display Slurm job steps. Unless you're dividing your job into steps with srun
, you probably will want sacct
/sacct-gacrc
to display one line per job (hide job steps, only show job allocation). To do this, use the -X
option. For more information on Slurm job allocation, please see the Slurm documentation.
The default sacct
columns are as follows:
JobID JobName Partition Account AllocCPUS State ExitCode
Using sacct-gacrc
runs the sacct
command but provides the following columns:
JobID JobName User Partition NNode NCPUS ReqMem CPUTime Elapsed Timelimit State ExitCode NodeList
As you can see, you're able to get much more useful information with sacct-gacrc
than with just the default sacct
formatting.
Output Columns Explained
- JobID: The unique ID of the job.
- JobName: The name of the job. If not specified in one's submission script, it will default to the name of the submission script (e.g. "sub.sh").
- User: The user who submitted the job.
- Partition: The partition to which the job was sent (e.g. batch, highmem_p, gpu_p, etc...).
- NNode: The number of nodes allocated to the job.
- NCPUS: The number of CPU cores allocated to the job.
- ReqMem: The amount of memory allocated to the job.
- Elapsed: How much (wall) time has elapsed since the job started, in the format DAYS-HOURS:MINUTES:SECONDS
- Timelimit: The maximum time given for the job to run, in the format DAYS-HOURS:MINUTES:SECONDS.
- State: The job's state (e.g. Running, Pending, etc...).
- ExitCode: The job's exit code.
- Nodelist: The name of the node(s) on which the job is running or ran.
sacct-gacrc
also has a -h/--help option:
bc06026@ss-sub3 ~$ sacct-gacrc --help Usage: sacct-gacrc [OPTIONS] Description: preformatted wrapper for sacct. See man sacct for more information. -E, --endtime Display information about jobs up to a date, in the format of yyyy-mm-dd (default: now) -j, --jobs Display information about a particular job or jobs (comma-separated list if more than one job) -r, --partition Display information about jobs from a particular partition -S, --starttime Display information about jobs starting from a date in the format of yyyy-mm-dd (default: Midnight of today) -u, --user Display information about a particular user's job(s) (default: current user) -X, --allocations Only show one line per job (do not display job steps) --debug Display the sacct command being executed -h, --help Display this help output
Examples
- See information about all of your jobs that started from midnight up to now:
sacct-gacrc
- See information about a particular job:
sacct-gacrc -j JOBID
(replacing JOBID with a particular job ID) - See information about all of your jobs that started from midnight up to now in the highmem_p:
sacct-gacrc -r highmem_p
- See information about your jobs that from a particular date up to now:
sacct-gacrc -S YYYY-MM-DD
(replacing YYYY-MM-DD with a date, e.g. 2021-09-01)
Example sacct-gacrc
output:
bc06026@b1-24 ~$ sacct-gacrc -X -S 2021-09-14 JobID JobName User Partition NodeList AllocNodes NTask NCPUS ReqMem MaxVMSize State CPUTime Elapsed Timelimit ExitCode ------------ ---------- --------- ---------- ---------- ---------- ----- ----- ------- ---------- ---------- ---------- ---------- ---------- -------- 4580375 interact bc06026 highmem_p ra4-22 1 1 200Gn FAILED 00:00:07 00:00:07 12:00:00 1:0 4580382 interact bc06026 highmem_p d1-22 1 28 200Gn COMPLETED 00:03:16 00:00:07 12:00:00 0:0 4584992 interact bc06026 inter_p c4-16 1 1 2Gn COMPLETED 00:00:18 00:00:18 12:00:00 0:0 ...