Tmp: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
(Blanked the page)
Tag: Blanking
 
(16 intermediate revisions by the same user not shown)
Line 1: Line 1:
__TOC__


= Pending or Running Jobs =
The easiest way to monitor pending or running jobs is with the Slurm <code>squeue</code> command.  Like most Slurm commands, you are able to control the columns displayed in the output of this command (see <code>man squeue</code> for more information).  To save you that trouble and to make things more convenient, we've created the <code>sq</code> command, which is <code>squeue</code> but pre-formatted and with some additional options for convenience. 
The key thing to remember about <code>squeue</code>/<code>sq</code> is that without any options, it shows ALL currently running and pending jobs on the cluster.  In order to show only your currently running and pending jobs, you will want to use the <code>--me</code> option.
The default <code>squeue</code> columns are as follows:
<pre class="gcomment">
JOBID PARTITION    NAME    USER ST      TIME  NODES NODELIST(REASON)
</pre>
Using <code>sq</code> runs the <code>squeue</code> command but provides the following columns:
<pre class="gcomment">
JOBID      TIME            TIME_LIMIT      NAME            PARTITION        USER      NODES  CPUS  MIN_MEMORY  PRIORITY  STATE      NODELIST(REASON)
</pre>
As you can see, you're able to get much more useful information with <code>sq</code> than with just the default <code>squeue</code> formatting. 
'''Output Columns Explained'''
* '''JOBID''': The unique ID of the job.
* '''TIME''': How much (wall) time has elapsed since the job started, in the format DAYS-HOURS:MINUTES:SECONDS
* '''TIME_LIMIT''': The maximum time given for the job to run, in the format DAYS-HOURS:MINUTES:SECONDS.
* '''NAME''': The name of the job.  If not specified in one's submission script, it will default to the name of the submission script (e.g. "sub.sh").
* '''PARTITION''': The partition to which the job was sent (e.g. batch, highmem_p, gpu_p, etc...).
* '''USER''': The user who submitted the job.
* '''NODES''': The number of nodes allocated to the job.
* '''CPUS''': The number of CPU cores allocated to the job.
* '''MIN_MEMORY''': The amount of memory allocated to the job.
* '''PRIORITY''': The job's priority per Slurm's [https://slurm.schedmd.com/priority_multifactor.html Multifactor Priority Plugin]
* '''STATE''': The job's state (e.g. Running, Pending, etc...)
* '''NODELIST(REASON)''': The name of the node(s) on which the job is running or the reason the job has not started yet, if it is pending.
<code>sq</code> also has a -h/--help option:
<pre class="gcomment">
bc06026@ss-sub3 ~$ sq --help
Usage: sq [OPTIONS]
Descriptions: sq - preformatted wrapper for squeue.  See man squeue for more information.
    --me                        Displays squeue output for the user executing this command
    -p                          Displays squeue output for a given partition
    -u                          Displays squeue output for a given user
    -T                          Displays submit and start time columns
    -h, --help                  Displays this help output
</pre>
<big><big>'''Examples'''</big></big>
* See all pending and running jobs: <code>sq</code>
* See all of your pending and running jobs: <code>sq --me</code>
* See all pending and running jobs in the highmem_p: <code>sq -p highmem_p</code>
* See all of your pending and running jobs in the batch partition: <code>sq --me -p batch</code>
* See all of your pending and  running jobs including submit time and start time columns: <code>sq --me -T</code> (Note, this will require a wide monitor or small font to display without columns wrapping around)
<big>'''Example <code>sq</code> output:'''</big>
<pre class="gcomment">
bc06026@ss-sub3 ~$ sq
JOBID      TIME            TIME_LIMIT      NAME            PARTITION        USER      NODES  CPUS  MIN_MEMORY  PRIORITY  STATE      NODELIST(REASON)   
4581410    2:10:56        10:00:00        Bowtie2-test    batch            zp21982    1      1      12G          6003      RUNNING    c5-4             
4584815    1:51:03        2:00:00        test-job        highmem_p        rt12352    1      12    300G        5473      RUNNING    d3-9             
4578428    4:57:15        1-2:00:00      PR6_Cd3        batch            un12354    1      1      40G          5449      RUNNING    c4-16             
4583491    1:57:38        12:00:00        interact        inter_p          ai38821    1      4      2G          5428      RUNNING    d5-21             
4580374    2:54:41        12:00:00        BLAST          batch            gh98762    1      1      10G          5397      RUNNING    b1-9
...
</pre>
----
[[#top|Back to Top]]
= Previously Ran Jobs =
The easiest way to monitor previously ran jobs is with the Slurm <code>sacct</code> command.  Like most Slurm commands, you are able to control the columns displayed in the output of this command (see <code>man sacct</code> for more information).  To save you that trouble and to make things more convenient, we've created the <code>sacct-gacrc</code> command, which is <code>sacct</code> but pre-formatted and with some additional options for convenience. 
A big difference between <code>squeue</code>/<code>sq</code> and <code>sacct</code>/<code>sacct-gacrc</code> is that by default, <code>sacct</code>/<code>sacct-gacrc</code> without any options only shows you YOUR Jobs.  Another important note about <code>sacct</code>/<code>sacct-gacrc</code> is that by default it will display Slurm job ''steps''.  Unless you're dividing your job into steps with <code>srun</code>, you probably will want <code>sacct</code>/<code>sacct-gacrc</code> to display one line per job (hide job steps, only show job allocation).  To do this, use the <code>-X</code> option.  For more information on Slurm job allocation, please see the [https://slurm.schedmd.com/job_launch.html documentation].
The default <code>sacct</code> columns are as follows:
<pre class="gcomment">
JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
</pre>
Using <code>sacct-gacrc</code> runs the <code>sacct</code> command but provides the following columns:
<pre class="gcomment">
JobID        JobName      User  Partition NNode NCPUS  ReqMem    CPUTime    Elapsed  Timelimit      State ExitCode  NodeList
</pre>
As you can see, you're able to get much more useful information with <code>sacct-gacrc</code> than with just the default <code>sacct</code> formatting. 
'''Output Columns Explained'''
* '''JobID''': The unique ID of the job.
* '''JobName''': The name of the job.  If not specified in one's submission script, it will default to the name of the submission script (e.g. "sub.sh").
* '''User''': The user who submitted the job.
* '''Partition''': The partition to which the job was sent (e.g. batch, highmem_p, gpu_p, etc...).
* '''NNode''': The number of nodes allocated to the job.
* '''NCPUS''': The number of CPU cores allocated to the job.
* '''ReqMem''': The amount of memory allocated to the job.
* '''Elapsed''': How much (wall) time has elapsed since the job started, in the format DAYS-HOURS:MINUTES:SECONDS
* '''Timelimit''': The maximum time given for the job to run, in the format DAYS-HOURS:MINUTES:SECONDS.
* '''State''': The job's state (e.g. Running, Pending, etc...).
* '''ExitCode''': The job's exit code.
* '''Nodelist''': The name of the node(s) on which the job is running or ran.
<code>sacct-gacrc</code> also has a -h/--help option:
<pre class="gcomment">
bc06026@ss-sub3 ~$ sacct-gacrc --help
Usage: sacct-gacrc [OPTIONS]
Description: preformatted wrapper for sacct.  See man sacct for more information.
    -E, --endtime              Display information about jobs up to a date, in the format of yyyy-mm-dd (default: now)
    -j, --jobs                  Display information about a particular job or jobs
    -r, --partition            Display information about jobs from a particular partition
    -S, --starttime            Display information about jobs starting from a date in the format of yyyy-mm-dd (default: Midnight of today)
    -u, --user                  Display information about a particular user's job(s) (default: current user)
    -X, --allocations          Only show one line per job (do not display job steps)
    --debug                    Display the sacct command being executed
    -h, --help                  Display this help output
</pre>
<big><big>'''Examples'''</big></big>
* See information about all of your jobs that started from midnight up to now: <code>sacct-gacrc</code>
* See information about a particular job: <code>sacct-gacrc -j JOBID</code> (replacing JOBID with a particular job ID)
* See information about all of your jobs that started from midnight up to now in the highmem_p: <code>sacct-gacrc -r highmem_p</code>
* See information about your jobs that from a particular date up to now: <code>sacct-gacrc -S YYYY-MM-DD</code> (replacing YYYY-MM-DD with a date, e.g. 2021-09-01)
<big>'''Example <code>sq</code> output:'''</big>
<pre class="gcomment">
bc06026@ss-sub3 ~$ sq
JOBID      TIME            TIME_LIMIT      NAME            PARTITION        USER      NODES  CPUS  MIN_MEMORY  PRIORITY  STATE      NODELIST(REASON)   
4581410    2:10:56        10:00:00        Bowtie2-test    batch            zp21982    1      1      12G          6003      RUNNING    c5-4             
4584815    1:51:03        2:00:00        test-job        highmem_p        rt12352    1      12    300G        5473      RUNNING    d3-9             
4578428    4:57:15        1-2:00:00      PR6_Cd3        batch            un12354    1      1      40G          5449      RUNNING    c4-16             
4583491    1:57:38        12:00:00        interact        inter_p          ai38821    1      4      2G          5428      RUNNING    d5-21             
4580374    2:54:41        12:00:00        BLAST          batch            gh98762    1      1      10G          5397      RUNNING    b1-9
...
</pre>
----
[[#top|Back to Top]]

Latest revision as of 12:27, 17 September 2021