Tmp: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
(Blanked the page)
Tag: Blanking
 
(19 intermediate revisions by the same user not shown)
Line 1: Line 1:
__TOC__


= Pending or Running Jobs =
The easiest way to monitor pending or running jobs is with the Slurm <code>squeue</code> command.  Like most Slurm commands, you are able to control the columns displayed in the output of this command (see <code>man squeue</code> for more information).  To save you that trouble and to make things more convenient, we've created the <code>sq</code> command, which is <code>squeue</code> but pre-formatted and with some additional options for convenience.
The default <code>squeue</code> columns are as follows:
<pre class="gcomment">
JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
</pre>
Using <code>sq</code> runs the <code>squeue</code> command but provides the following columns:
<pre class="gcomment">
JOBID      TIME            TIME_LIMIT      NAME            PARTITION        USER      NODES  CPUS  MIN_MEMORY  PRIORITY  STATE      NODELIST(REASON)
</pre>
As you can see, you're able to get much more useful information with <code>sq</code> than with just the default <code>squeue</code> formatting. 
'''Output Columns Explained'''
* '''JOBID''': The unique ID of the job.
* '''TIME''': How much (wall) time has elapsed since the job started, in the format DAYS-HOURS:MINUTES:SECONDS
* '''TIME_LIMIT''': The maximum time given for the job to run, in the format DAYS-HOURS:MINUTES:SECONDS.
* '''NAME''': The name of the job.  If not specified in one's submission script, it will default to the name of the submission script (e.g. "sub.sh").
* '''PARTITION''': The partition to which the job was sent (e.g. batch, highmem_p, gpu_p, etc...).
* '''USER''': The user who submitted the job.
* '''NODES''': The number of nodes allocated to the job.
* '''CPUS''': The number of CPU cores allocated to the job.
* '''MIN_MEMORY''': The amount of memory allocated to the job.
* '''PRIORITY''': The job's priority per Slurm's [https://slurm.schedmd.com/priority_multifactor.html Multifactor Priority Plugin]
* '''STATE''': The job's state (e.g. Running, Pending, etc...)
* '''NODELIST(REASON)''': The name of the node(s) on which the job is running or the reason the job has not started yet, if it is pending.
<code>sq</code> also has a -h/--help option:
<pre class="gcomment">
bc06026@ss-sub3 ~$ sq --help
Usage: sq [OPTIONS]
Descriptions: sq - preformatted wrapper for squeue.  See man squeue for more information.
    --me                        Displays squeue output for the user executing this command
    -p                          Displays squeue output for a given partition
    -u                          Displays squeue output for a given user
    -T                          Displays submit and start time columns
    -h, --help                  Displays this help output
</pre>
<big><big>'''Examples'''</big></big>
* See all pending and running jobs: <code>sq</code>
* See all of your pending and running jobs: <code>sq --me</code>
* See all pending and running jobs in the highmem_p: <code>sq -p highmem_p</code>
* See all of your pending and running jobs in the batch partition: <code>sq --me -p batch</code>
* See all of your pending and  running jobs including submit time and start time columns: <code>sq --me -T</code> (Note, this will require a wide monitor or small font to display without columns wrapping around)
<big>'''Example <code>sq</code> output:'''</big>
<pre class="gcomment">
bc06026@ss-sub3 ~$ sq
JOBID      TIME            TIME_LIMIT      NAME            PARTITION        USER      NODES  CPUS  MIN_MEMORY  PRIORITY  STATE      NODELIST(REASON)   
4581410    2:10:56        10:00:00        Bowtie2-test    batch            zp21982    1      1      12G          6003      RUNNING    c5-4             
4584815    1:51:03        2:00:00        test-job        highmem_p        rt12352    1      12    300G        5473      RUNNING    d3-9             
4578428    4:57:15        1-2:00:00      PR6_Cd3        batch            un12354    1      1      40G          5449      RUNNING    c4-16             
4583491    1:57:38        12:00:00        interact        inter_p          ai38821    1      4      2G          5428      RUNNING    d5-21             
4580374    2:54:41        12:00:00        BLAST          batch            gh98762    1      1      10G          5397      RUNNING    b1-9
...
</pre>
----
[[#top|Back to Top]]
= Previously Ran Jobs =
The easiest way to monitor previously ran jobs is with the Slurm <code>sacct</code> command.  Like most Slurm commands, you are able to control the columns displayed in the output of this command (see <code>man sacct</code> for more information).  To save you that trouble and to make things more convenient, we've created the <code>sacct-gacrc</code> command, which is <code>sacct</code> but pre-formatted and with some additional options for convenience.
The default <code>sacct</code> columns are as follows:
<pre class="gcomment">
JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
</pre>
Using <code>sacct-gacrc</code> runs the <code>sacct</code> command but provides the following columns:
<pre class="gcomment">
JobID        JobName      User  Partition NNode NCPUS  ReqMem    CPUTime    Elapsed  Timelimit      State ExitCode  NodeList
</pre>
As you can see, you're able to get much more useful information with <code>sacct-gacrc</code> than with just the default <code>sacct</code> formatting. 
'''Output Columns Explained'''
* '''JobID''': The unique ID of the job.
* '''JobName''': The name of the job.  If not specified in one's submission script, it will default to the name of the submission script (e.g. "sub.sh").
* '''User''': The user who submitted the job.
* '''Partition''': The partition to which the job was sent (e.g. batch, highmem_p, gpu_p, etc...).
* '''NNode''': The number of nodes allocated to the job.
* '''NCPUS''': The number of CPU cores allocated to the job.
* '''ReqMem''': The amount of memory allocated to the job.
* '''Elapsed''': How much (wall) time has elapsed since the job started, in the format DAYS-HOURS:MINUTES:SECONDS
* '''Timelimit''': The maximum time given for the job to run, in the format DAYS-HOURS:MINUTES:SECONDS.
* '''State''': The job's state (e.g. Running, Pending, etc...).
* '''ExitCode''': The job's exit code.
* '''Nodelist''': The name of the node(s) on which the job is running or ran.
<code>sacct-gacrc</code> also has a -h/--help option:
<pre class="gcomment">
bc06026@ss-sub3 ~$ sacct-gacrc --help
Usage: sacct-gacrc [OPTIONS]
Description: preformatted wrapper for sacct.  See man sacct for more information.
    -E, --endtime              Display information about jobs up to a date, in the format of yyyy-mm-dd (default: now)
    -j, --jobs                  Display information about a particular job or jobs
    -r, --partition            Display information about jobs from a particular partition
    -S, --starttime            Display information about jobs starting from a date in the format of yyyy-mm-dd (default: Midnight of today)
    -u, --user                  Display information about a particular user's job(s) (default: current user)
    -X, --allocations          Only show one line per job (do not display job steps)
    --debug                    Display the sacct command being executed
    -h, --help                  Display this help output
</pre>

Latest revision as of 12:27, 17 September 2021