Podman

From Research Computing Center Wiki
Jump to navigation Jump to search

Podman is available on Sapelo2 for researchers who need to run software distributed as OCI container images. It can be useful when a project already provides a standard container image or when a workflow expects familiar Docker-style commands. Podman is provided in a rootless configuration and behaves differently from a typical workstation or server installation. In particular, image storage and some runtime features are constrained by the cluster's file system and stateless node environment. This guide explains the main differences, why they exist, and the practical limitations users should expect when running Podman on compute nodes.

Podman in Slurm jobs

Podman depends on XDG_RUNTIME_DIR (typically /run/user/$UID) for rootless operation. On Sapelo2, Slurm jobs are not associated with a user systemd session, so this directory is not created automatically.

As a result, Podman will fail in Slurm jobs unless a user session is created. Here is an example error from an interactive job:

Failed to obtain podman configuration: lstat /run/user/1337: no such file or directory

Workaround

Important: Podman will not work in Slurm jobs unless this step is performed.

Before running any Podman commands inside a Slurm job, start a background SSH session to localhost

ssh -N localhost >/dev/null 2>&1 &

This creates a user session in the background and initializes /run/user/$UID

This workaround requires passwordless SSH access to your own account. You must have

  • An SSH keypair in ~/.ssh/
  • Your public key added to ~/.ssh/authorized_keys

Test with:

ssh localhost

This should not prompt for a password.

This works because SSH creates a PAM/systemd user session, which initializes /run/user/$UID. The session remains active as long as the SSH process is running.

Because the SSH process runs in the background of the job, it is automatically cleaned up when the job exits.

Filesystem limitations

On Sapelo2, Podman images are effectively node-local, not cluster-global like /apps. Rootless Podman normally expects local filesystem behavior for its image storage, but NFS-backed /home and Lustre-backed /scratch do not provide that behavior. As a result, Podman is configured to use local scratch storage on the compute node where the image is pulled or loaded.

In practice, this means an image you pull on one compute node will not automatically be available on another. Researchers should expect to pull or load images again when moving between nodes.

Transporting images with podman save and podman load

A practical workaround is to save an image as a tar archive on shared storage, then load it into Podman on the node where you want to run it. While this does not make images globally shared, it does make them quicker to move between nodes without re-downloading from a registry on each node.

podman pull ubuntu
podman save -o ubuntu.tar ubuntu
podman load -i ubuntu.tar

NVIDIA Container Toolkit

The NVIDIA Container Toolkit is the layer that enables Podman to expose NVIDIA GPUs inside containers. A major feature of the toolkit is CDI, the Container Device Interface. CDI lets containers request GPUs using stable, predictable names such as nvidia.com/gpu=0, nvidia.com/gpu=1, or even nvidia.com/gpu=all. On a heterogeneous cluster, where GPU types and counts may differ across nodes, this is much simpler than manually listing /dev/nvidia* devices and host library mounts for each container.

A typical CDI-based Podman command looks like this:

podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi

CUDA validation example

A practical way to check that CUDA is working inside a container is to run a small PyTorch test rather than relying only on GPU enumeration. In this example, PyTorch reports that CUDA is available, allocates a tensor, and runs a simple arithmetic operation.

Note: Due to a known issue with the NVIDIA CDI update-ldcache hook on Sapelo2 (see below), this example includes a manual bind mount of libcuda.so.1, which would not normally be required.

$ podman run --rm -it \
  --device nvidia.com/gpu=all \
  -v /usr/lib64/libcuda.so.1:/usr/lib/x86_64-linux-gnu/libcuda.so.1:ro \
  docker.io/pytorch/pytorch:2.9.0-cuda12.8-cudnn9-runtime \
  python3
Trying to pull docker.io/pytorch/pytorch:2.9.0-cuda12.8-cudnn9-runtime...
[output truncated]
Writing manifest to image destination
Python 3.11.14 | packaged by conda-forge | (main, Oct 13 2025, 14:09:32) [GCC 14.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> (torch.tensor([1.0], device="cuda") * 2).item()
2.0
>>> torch.cuda.get_device_name(0)
'NVIDIA L4'
>>> quit()
$

Known Issues

Broken update-ldcache OCI hook

The specific hook causing trouble is the NVIDIA CDI update-ldcache hook. In the generated CDI configuration, that hook appears as a createContainer hook that invokes /usr/bin/nvidia-cdi-hook with the update-ldcache argument. When Podman tries to start a container with CDI enabled, this hook can fail preventing container startup. The update-ldcache hook is meant to help make host NVIDIA libraries visible inside the container by updating the dynamic linker cache path used by the containerized environment. In a more typical setup, this helps ensure that GPU-related executables inside the container can locate the right shared libraries without extra manual bind mounts. When the update-ldcache hook fails, CDI may still identify the GPUs correctly, but the automatic library setup path is incomplete. In practice, that means users may need to manually bind-mount some host libraries or utilities into the container instead of relying on the hook to make everything available automatically.

Due to the issues, the update-ldcache OCI hook has been disabled in CDI. While this avoids container startup failures, it also bypasses the mechanism that updates the container’s dynamic linker cache with host NVIDIA library paths. Consequently, GPU-related libraries may not be discovered automatically inside the container, requiring some manual bind mounts or environment configuration to ensure proper runtime behavior.

This effect is clearly seen in the CUDA validation example above, where libcuda.so.1 is explicitly bind-mounted (-v /usr/lib64/libcuda.so.1:/usr/lib/x86_64-linux-gnu/libcuda.so.1:ro) despite CDI being enabled via --device nvidia.com/gpu=all