Job Submission partitions on Sapelo2: Difference between revisions
No edit summary |
|||
(24 intermediate revisions by 3 users not shown) | |||
Line 20: | Line 20: | ||
| batch_30d || 30 days || 1 || 2 || Regular nodes. A given user can have up to one job running at a time here, plus one pending, or two pending and none running. A user's attempt to submit a third job into this partition will be rejected. | | batch_30d || 30 days || 1 || 2 || Regular nodes. A given user can have up to one job running at a time here, plus one pending, or two pending and none running. A user's attempt to submit a third job into this partition will be rejected. | ||
|- | |- | ||
| highmem_p || 7 days || | | highmem_p || 7 days || 6 || 100 || For high memory jobs | ||
|- | |- | ||
| highmem_30d_p || 30 days || 1 || 2 || For high memory jobs. A given user can have up to one job running at a time here, plus one pending, or two pending and none running. A user's attempt to submit a third job into this partition will be rejected. | | highmem_30d_p || 30 days || 1 || 2 || For high memory jobs. A given user can have up to one job running at a time here, plus one pending, or two pending and none running. A user's attempt to submit a third job into this partition will be rejected. | ||
Line 28: | Line 28: | ||
|4 | |4 | ||
|4 | |4 | ||
|For jobs needing up to | |For jobs needing up to 3TB of memory | ||
|- | |- | ||
|hugemem_30d_p | |hugemem_30d_p | ||
Line 34: | Line 34: | ||
|4 | |4 | ||
|4 | |4 | ||
|For jobs needing up to | |For jobs needing up to 3TB of memory | ||
|- | |- | ||
| gpu_p || 7 days || | | gpu_p || 7 days || 6 || 20 || For GPU-enabled jobs. | ||
|- | |- | ||
| gpu_30d_p || 30 days || 2 || 2 || For GPU-enabled jobs. A given user can have up to one job running at a time here, plus one pending, or two pending and none running. A user's attempt to submit a third job into this partition will be rejected. | | gpu_30d_p || 30 days || 2 || 2 || For GPU-enabled jobs. A given user can have up to one job running at a time here, plus one pending, or two pending and none running. A user's attempt to submit a third job into this partition will be rejected. | ||
Line 58: | Line 58: | ||
! scope="col" | GPU Cards/Node | ! scope="col" | GPU Cards/Node | ||
|- | |- | ||
| style="text-align: center | | rowspan="8" style="text-align: center" | batch, batch_30d | ||
|- | |- | ||
| | | 14 || style="color:red" |'''740''' || style="color:red"| '''128''' || AMD EPYC Genoa (4th gen) || rowspan="14" style="text-align: center" | N/A | ||
|- | |- | ||
| | | 120 || 500 || style="color:red"| '''128''' || AMD EPYC Milan (3rd gen) | ||
|- | |- | ||
| | |4 | ||
|250 | |||
|64 | |||
|AMD EPYC Milan (3rd gen) | |||
|- | |||
| 2 || rowspan="3" | 120 || 64 || AMD EPYC Milan (3rd gen) | |||
|- | |||
| 123 || 64 || AMD EPYC Rome (2nd gen) | |||
|- | |- | ||
| | | 50 | ||
| 32 | |||
| AMD EPYC Naples (1st gen) | |||
|- | |- | ||
| 42 || 180 || 32 || Intel Xeon Skylake | | 42 || 180 || 32 || Intel Xeon Skylake | ||
|- | |- | ||
| | | rowspan="5" style="text-align: center" | highmem_p, highmem_30d_p | ||
| 18 || 500 || 32 || AMD EPYC Naples (1st gen) | |||
|- | |- | ||
| | | 2 || rowspan="4" style="color:red" |'''990'''|| style="color:red" |'''128'''|| AMD EPYC Milan (3rd gen) | ||
| | |||
|- | |- | ||
| | | 12 || 32 || AMD EPYC Milan (3rd gen) | ||
|- | |- | ||
| | | 2 || 64 || AMD EPYC Naples (1st gen) | ||
|- | |- | ||
| | | 1 || 28 || Intel Xeon Broadwell | ||
|- | |- | ||
| | | rowspan="2" style="text-align: center"|hugemem_p, hugemem_30d_p | ||
| 3 | |||
| style="color:red"|'''3000''' | |||
| style="color:red"|'''48''' | |||
|AMD EPYC Genoa (4th gen) | |||
|- | |- | ||
| 2 | | 2 | ||
| | | 2000 | ||
| | | 32 | ||
|AMD EPYC | |AMD EPYC Rome (2nd gen) | ||
|- | |- | ||
| rowspan="4" style="text-align: center" | gpu_p, gpu_30d_p || | | rowspan="4" style="text-align: center" | gpu_p, gpu_30d_p || 2 || 180 || 32 || Intel Xeon Skylake || 1 NVDIA P100 | ||
|- | |- | ||
| | |14 | ||
|style="color:red" |'''1000''' | |||
|style="color:red" |'''64''' | |||
|AMD EPYC Milan (3rd gen) | |||
|4 NVIDIA A100 | |||
|- | |- | ||
| | |12 | ||
|style="color:red" |'''1000''' | |||
|style="color:red" |'''64''' | |||
|Intel Xeon SapphireRapids | |||
|4 NVIDIA H100 | |||
|- | |- | ||
| | |12 | ||
|style="color:red" |''' | |style="color:red" |'''740''' | ||
|style="color:red" |'''128''' | |style="color:red" |'''128''' | ||
|AMD EPYC | |AMD EPYC Genoa (4th gen) | ||
|4 NVIDIA | |4 NVIDIA L4 | ||
|- | |- | ||
| style="text-align: center" | '''name'''_p || style="text-align: center" colspan="5" | variable | | style="text-align: center" | '''name'''_p || style="text-align: center" colspan="5" | variable | ||
|- | |- | ||
|} | |} |
Latest revision as of 09:31, 12 September 2024
Batch partitions (queues) defined on the Sapelo2
There are different partitions defined on Sapelo2. The Slurm queueing system refers to queues as partition. Users are required to specify, in the job submission script or as job submission command line arguments, the partition and the resources needed by the job in order for it to be assigned to compute node(s) that have enough available resources (such as number of cores, amount of memory, GPU cards, etc). Please note, Slurm will not allow a job to be submitted if there are no resources matching your request. Please refer to Migrating from Torque to Slurm for more info about Slurm queueing system.
The following partitions are defined on the Sapelo2 cluster:
Partition Name | Time limit | Max jobs running | Max jobs able to be submitted | Notes |
---|---|---|---|---|
batch | 7 days | 250 | 10,000 | Regular nodes. |
batch_30d | 30 days | 1 | 2 | Regular nodes. A given user can have up to one job running at a time here, plus one pending, or two pending and none running. A user's attempt to submit a third job into this partition will be rejected. |
highmem_p | 7 days | 6 | 100 | For high memory jobs |
highmem_30d_p | 30 days | 1 | 2 | For high memory jobs. A given user can have up to one job running at a time here, plus one pending, or two pending and none running. A user's attempt to submit a third job into this partition will be rejected. |
hugemem_p | 7 days | 4 | 4 | For jobs needing up to 3TB of memory |
hugemem_30d_p | 30 days | 4 | 4 | For jobs needing up to 3TB of memory |
gpu_p | 7 days | 6 | 20 | For GPU-enabled jobs. |
gpu_30d_p | 30 days | 2 | 2 | For GPU-enabled jobs. A given user can have up to one job running at a time here, plus one pending, or two pending and none running. A user's attempt to submit a third job into this partition will be rejected. |
inter_p | 2 days | 3 | 20 | Regular nodes, for interactive jobs. |
name_p | variable | Partitions that target different groups' buy-in nodes. The name string is specific to each group. |
When defining the resources for your job, you'll want to make sure you stay within the bounds of the resources available for the partition that you're using. The below table outlines the resources available per type of node, with the red values being the maximum for that corresponding partition.
Partition Name | # of Nodes | Max Mem(GB)/Node | Max Cores/Node | Processor Type | GPU Cards/Node |
---|---|---|---|---|---|
batch, batch_30d | |||||
14 | 740 | 128 | AMD EPYC Genoa (4th gen) | N/A | |
120 | 500 | 128 | AMD EPYC Milan (3rd gen) | ||
4 | 250 | 64 | AMD EPYC Milan (3rd gen) | ||
2 | 120 | 64 | AMD EPYC Milan (3rd gen) | ||
123 | 64 | AMD EPYC Rome (2nd gen) | |||
50 | 32 | AMD EPYC Naples (1st gen) | |||
42 | 180 | 32 | Intel Xeon Skylake | ||
highmem_p, highmem_30d_p | 18 | 500 | 32 | AMD EPYC Naples (1st gen) | |
2 | 990 | 128 | AMD EPYC Milan (3rd gen) | ||
12 | 32 | AMD EPYC Milan (3rd gen) | |||
2 | 64 | AMD EPYC Naples (1st gen) | |||
1 | 28 | Intel Xeon Broadwell | |||
hugemem_p, hugemem_30d_p | 3 | 3000 | 48 | AMD EPYC Genoa (4th gen) | |
2 | 2000 | 32 | AMD EPYC Rome (2nd gen) | ||
gpu_p, gpu_30d_p | 2 | 180 | 32 | Intel Xeon Skylake | 1 NVDIA P100 |
14 | 1000 | 64 | AMD EPYC Milan (3rd gen) | 4 NVIDIA A100 | |
12 | 1000 | 64 | Intel Xeon SapphireRapids | 4 NVIDIA H100 | |
12 | 740 | 128 | AMD EPYC Genoa (4th gen) | 4 NVIDIA L4 | |
name_p | variable |