GPU: Difference between revisions

Revision as of 21:18, 2 September 2024

GPU Computing on Sapelo2

Hardware

For a description of the Graphics Processing Units (GPU) device specifications, please see GPU Hardware.

The following table summarizes the GPU devices available on sapelo2:

Number of nodes	CPU cores per node	Host memory per node	CPU processor	GPU model	GPU devices per node	Device memory	GPU compute capability	Minimum CUDA version	Partition Name	Notes
10	128	1TB	Intel Sapphire Rapids	H100	4	80GB	9.0	11.8	gpu_p, gpu_30d_p	Need to request --gres=gpu:H100, e.g., #SBATCH --partition=gpu_p #SBATCH --gres=gpu:H100:1 #SBATCH --time=7-00:00:00
14	64	1TB	AMD Milan	A100	4	80GB	8.0	11.0	gpu_p, gpu_30d_p	Need to request --gres=gpu:A100, e.g., #SBATCH --partition=gpu_p #SBATCH --gres=gpu:A100:1 #SBATCH --time=7-00:00:00
11	128	745GB	AMD Genoa	L4	4	24GB	8.9	11.8	gpu_p, gpu_30d_p	Need to request --gres=gpu:L4, e.g., #SBATCH --partition=gpu_p #SBATCH --gres=gpu:L4:1 #SBATCH --time=7-00:00:00
2	32	192GB	Intel Skylake	P100	1	16GB	6.0	8.0	gpu_p, gpu_30d_p	Need to request --gres=gpu:P100, e.g., #SBATCH --partition=gpu_p #SBATCH --gres=gpu:P100:1 #SBATCH --time=7-00:00:00
1	64	1TB	AMD Milan	A100	4	80GB	8.0	11.0	buyin partition	Available on batch for all users up to 4 hours, e.g., #SBATCH --partition=batch #SBATCH --gres=gpu:A100:1 or #SBATCH --gres=gpu:L4:1 or #SBATCH --gres=gpu:V100:1 or #SBATCH --gres=gpu:V100S:1 #SBATCH --time=4:00:00
2	64	745GB	AMD Genoa	L4	4	24GB	8.9	11.8	buyin partition
2	28	192GB	Intel Skylake	V100	1	16GB	7.0	9.0	buyin partition
2	32	192GB	Intel Skylake	V100	1	16GB	7.0	9.0	buyin partition
2	32	384GB	Intel Skylake	V100	1	32GB	7.0	9.0	buyin partition
2	64	128GB	AMD Naples	V100	2	32GB	7.0	9.0	buyin partition
1	64	128GB	AMD Naples	V100	1	32GB	7.0	9.0	buyin partition
4	64	128GB	AMD Rome	V100S	1	32GB	7.0	9.0	buyin partition

Software

Sapelo2 has several tools for GPU programming and many CUDA-enabled applications. For example:

1. NVIDIA CUDA toolkit

Several versions of the CUDA toolkit are available. Please see our CUDA page.

2. cuDNN

The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks.

To see all modules of cuDNN installed on Sapelo2, please use the command

ml spider cuDNN

3. NCCL

The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective communication

     primitives that are performance optimized for NVIDIA GPUs.

To see all modules of cuDNN installed on Sapelo2, please use the command

ml spider NCCL

4. OpenACC

Using the NVIDIA HPC SDK compiler suite, provided by the NVHPC module on Sapelo2, programmers can accelerate applications on x64+accelerator platforms by adding OpenACC compiler directives to Fortran and C programs and then recompiling with appropriate compiler options. Please see https://developer.nvidia.com/hpc-sdk and http://www.pgroup.com/resources/accel.htm

OpenACC is also supported by GNU compilers, especially the latest versions, e.g. GNU 7.2.0, installed on Sapelo2. For more information on OpenACC support by GNU compilers, please refer to https://gcc.gnu.org/wiki/OpenACC

For information on versions of compilers installed on Sapelo2, please see Code Compilation on Sapelo2.

5. CUDA-enabled applications

CUDA-enabled applications typically have a version suffix in the module name to indicate the version of CUDA that they were built with.

New modules that are being installed centrally using CUDA versions 12.1.1 or higher will include support for GPU compute capability up to 9.0. Some examples are:

GROMACS/2023.3-foss-2023a-CUDA-12.1.1-PLUMED-2.9.0

GROMACS/2023.4-foss-2023a-CUDA-12.1.1

PyTorch/2.1.2-foss-2023a-CUDA-12.1.1 (note that this version uses the foss-2023a toolchain)

However, some CUDA 12.1.1 modules installed before the H100 and L4 devices were added to the cluster only have support for GPU compute capability up to 8.0. Examples of such modules that do not work on the H100 and L4 are:

PyTorch/2.1.2-foss-2022a-CUDA-12.1.1 (note that this version uses the foss-2022a toolchain)

transformers/4.37.0-foss-2022a-PyTorch-2.1.2-CUDA-12.1.1.lua

transformers/4.41.2-foss-2022a-PyTorch-2.1.2-CUDA-12.1.1.lua

other modules that load PyTorch/2.1.2-foss-2022a-CUDA-12.1.1

Running Jobs

For information on how to run GPU jobs on Sapelo2, please refer to Running Jobs on Sapelo2.

Important notes:

1. If a job requests

#SBATCH --partition=gpu_p
#SBATCH --gres=gpu:1

then it can get allocated any GPU device type (i.e. P100, A100, L4, or H100). If you opt for requesting a GPU device without specifying its type, please make sure that the application or code you are running works on all device types.

2. If the application that you are running uses an older version of CUDA, for example CUDA/11.4.1 or CUDA/11.7.0, please request an explicit GPU device that supports the CUDA version. For example, request an A100 device with

#SBATCH --partition=gpu_p
#SBATCH --gres=gpu:A100:1

GPU: Difference between revisions

Revision as of 21:18, 2 September 2024

Contents

GPU Computing on Sapelo2

Hardware

Software

Running Jobs

Navigation menu

GPU: Difference between revisions

Revision as of 21:18, 2 September 2024

GPU Computing on Sapelo2

Hardware

Software

Running Jobs

Navigation menu

Search