Running Interactive Container Sessions on HPC
Environment
Slurm workload manager
GPU-enabled nodes
Enroot/Pyxis container runtime
Issue
Test or debug container environments
Install additional software
Develop container-based applications
Customize existing containers
Resolution
#. Basic Interactive Container Session
Start an interactive container session using the following command:
$ srun --account=[YOUR_ACCOUNT] \
--partition=normal \
--nodes=1 \
--ntasks-per-node=1 \
--gpus-per-node=1 \
--cpus-per-task=28 \
--container-writable \
--container-remap-root \
--no-container-mount-home \
--container-image nvcr.io#nvidia/nvhpc:24.3-devel-cuda12.3-ubuntu22.04 \
--container-save $HOME/my-container.sqsh \
--pty bash
Note
Changes will be lost without
--container-save, see Saving Enroot container failed for more details.Root access requires
--container-remap-rootand--container-writableInteractive sessions have a maximum walltime of 4 hours on HPC4 and 2 hours on SuperPOD
Create the target directory first:
mkdir -p $HOME/containersif saving to a subdirectory
#. Container Customization and Package Installation
Once inside the container, update and install packages:
root@node:/# apt update
root@node:/# apt install -y [package-name]
Common packages for development:
root@node:/# apt install -y vim git wget curl build-essential python3-pip
root@node:/# pip3 install numpy matplotlib jupyter
#. Using Previously Saved Containers
To start with a previously saved container, use --container-image /path/to/container/image.sqsh instead of pulling
from a registry:
$ srun --account=[YOUR_ACCOUNT] \
--partition=normal \
--nodes=1 \
--ntasks-per-node=1 \
--gpus-per-node=1 \
--cpus-per-task=28 \
--container-writable \
--container-remap-root \
--no-container-mount-home \
--container-image $HOME/my-container.sqsh \
--container-save $HOME/my-container-updated.sqsh \
--pty bash
Best Practices
Container Storage: Store containers in
$HOME/containers/for organizationNaming Convention: Use descriptive names:
pytorch-24.03-custom.sqshVersion Control: Save incremental versions during development
Resource Planning: Request appropriate CPU/GPU/memory based on workload