Job Submission partitions on Sapelo2 access

From Research Computing Center Wiki
Jump to navigation Jump to search


Job submission partitions on Sapelo2

Overview

This page describes the Slurm partitions available on the Sapelo2 cluster, including job limits and the resources available in each partition.

In Slurm, queues are called partitions. When you submit a job, you must request both:

  • the partition to use, and
  • the resources your job needs, such as CPU cores, memory, or GPU devices.

Slurm will reject a job submission if no nodes match the resources you request. For background on Slurm, see Migrating from Torque to Slurm.

How to use this page

Use the first table to choose a partition based on job type and time limit.

Use the second table to confirm that your requested resources fit within the hardware available in that partition.

Partition limits

Sapelo2 partitions, time limits, and per-user job limits
Partition name Time limit Maximum running jobs per user Maximum submitted jobs per user Intended use and notes
batch 7 days 250 10,000 Standard partition for regular compute jobs on general-purpose nodes.
batch_30d 30 days 1 2 Standard partition for long-running jobs on regular nodes. A user may have one running job and one pending job, or two pending jobs and no running job. A third submission to this partition will be rejected.
highmem_p 7 days 6 100 High-memory partition for jobs that require more memory than standard nodes provide.
highmem_30d_p 30 days 1 2 High-memory partition for long-running jobs. A user may have one running job and one pending job, or two pending jobs and no running job. A third submission to this partition will be rejected.
hugemem_p 7 days 4 4 Huge-memory partition for jobs needing up to 3 TB of memory.
hugemem_30d_p 30 days 4 4 Huge-memory partition for long-running jobs needing up to 3 TB of memory.
gpu_p 7 days 6 20 GPU-enabled partition for jobs that require one or more GPUs.
gpu_30d_p 30 days 2 2 GPU-enabled partition for long-running jobs. A user may have one running job and one pending job, or two pending jobs and no running job. A third submission to this partition will be rejected.
inter_p 2 days 3 20 Interactive partition for interactive jobs on regular nodes.
name_p Variable Variable Variable Partition for a specific group's buy-in nodes. Replace name with the group-specific partition prefix.

Resource limits by partition

Before submitting a job, make sure your requested memory, CPU cores, and GPU count fit within the limits of the partition you choose.

In the table below, the phrase partition maximum identifies the largest per-node resource values available within that partition. This replaces color-only emphasis so that the information is available to all users.

Node resources available in each Sapelo2 partition
Partition Number of nodes Memory per node (GB) CPU cores per node Processor type GPU configuration Notes
batch, batch_30d 16 740 128 AMD EPYC Genoa (4th gen) None Partition maximum for memory and cores is available on this node type.
batch, batch_30d 120 500 128 AMD EPYC Milan (3rd gen) None Partition maximum for cores is also available on this node type.
batch, batch_30d 4 250 64 AMD EPYC Milan (3rd gen) None Standard-capacity general-purpose nodes.
batch, batch_30d 2 120 64 AMD EPYC Milan (3rd gen) None Standard-capacity general-purpose nodes.
batch, batch_30d 123 120 64 AMD EPYC Rome (2nd gen) None Standard-capacity general-purpose nodes.
batch, batch_30d 25 120 32 AMD EPYC Naples (1st gen) None Lower-core-count general-purpose nodes.
batch, batch_30d 40 180 32 Intel Xeon Skylake None Lower-core-count general-purpose nodes.
highmem_p, highmem_30d_p 10 500 32 AMD EPYC Naples (1st gen) None High-memory nodes.
highmem_p, highmem_30d_p 2 990 128 AMD EPYC Milan (3rd gen) None Partition maximum for memory and cores is available on this node type.
highmem_p, highmem_30d_p 12 990 32 AMD EPYC Milan (3rd gen) None High-memory nodes with fewer available cores per node.
hugemem_p, hugemem_30d_p 3 3000 48 AMD EPYC Genoa (4th gen) None Partition maximum for memory and cores is available on this node type.
hugemem_p, hugemem_30d_p 2 2000 32 AMD EPYC Rome (2nd gen) None Huge-memory nodes with lower maximums than the partition peak.
gpu_p, gpu_30d_p 2 180 32 Intel Xeon Skylake 1 NVIDIA P100 Older GPU nodes.
gpu_p, gpu_30d_p 2 120 64 AMD EPYC Rome (2nd gen) 1 NVIDIA V100S Single-GPU nodes with 64 cores.
gpu_p, gpu_30d_p 14 1000 64 AMD EPYC Milan (3rd gen) 4 NVIDIA A100 Partition maximum for memory is available on this node type.
gpu_p, gpu_30d_p 12 1000 64 Intel Xeon Sapphire Rapids 4 NVIDIA H100 Partition maximum for memory is available on this node type.
gpu_p, gpu_30d_p 12 740 128 AMD EPYC Genoa (4th gen) 4 NVIDIA L4 Partition maximum for cores is available on this node type.
name_p Variable Variable Variable Variable Variable Resource limits depend on the group's buy-in nodes.

Choosing a partition

A general rule of thumb is:

  • Use batch for most non-GPU jobs.
  • Use batch_30d only when your job genuinely needs a longer wall time.
  • Use highmem_p or highmem_30d_p when your memory requirements exceed what standard nodes provide.
  • Use hugemem_p or hugemem_30d_p for jobs that need very large memory allocations, including jobs approaching 3 TB of memory.
  • Use gpu_p or gpu_30d_p for GPU jobs.
  • Use inter_p for interactive work.
  • Use name_p only if your group has access to a buy-in partition with that name.

Example Slurm directives

The examples below show common ways to request a partition.

Regular compute job

#SBATCH --partition=batch
#SBATCH --time=2-00:00:00
#SBATCH --cpus-per-task=16
#SBATCH --mem=64G

GPU job

#SBATCH --partition=gpu_p
#SBATCH --time=1-00:00:00
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=64G

High-memory job

#SBATCH --partition=highmem_p
#SBATCH --time=12:00:00
#SBATCH --cpus-per-task=16
#SBATCH --mem=700G

Terms used on this page

Partition
A Slurm queue that determines which nodes your job may run on.
Time limit
The maximum wall-clock runtime allowed for a job in that partition.
Running jobs
Jobs currently executing for a user in that partition.
Submitted jobs
Total jobs a user may have in the partition, including running and pending jobs.
Buy-in nodes
Nodes purchased by a specific group and made available through a group-specific partition.

Related documentation