Running Container in Batch Mode on HPC
Environment
Slurm workload manager
GPU-enabled nodes
Enroot/Pyxis container runtime
Issue
Run long-running container workloads without interactive sessions
Execute batch jobs using containerized applications
Schedule container-based computations on HPC clusters
Submit jobs to the queue for efficient resource utilization
Resolution
Important
Before running batch jobs, we strongly recommend testing your container and commands in interactive mode. This helps ensure your container works correctly and your commands are properly configured.
For detailed instructions on running containers interactively, see: Running Interactive Container Sessions on HPC
#. Batch Job with Custom Container
If you need to use a customized container (created during interactive testing), save it first and then use it in batch mode:
#!/bin/bash
#SBATCH --account=[YOUR_ACCOUNT]
#SBATCH --partition=normal
#SBATCH --job-name=custom_container
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gpus-per-node=1
#SBATCH --cpus-per-task=28
#SBATCH --time=24:00:00
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err
# Use your custom container
srun --container-writable \
--container-remap-root \
--no-container-mount-home \
--container-image $HOME/containers/my-custom-container.sqsh \
python3 ...
#. Multi-Node Container Jobs
For parallel applications that span multiple nodes:
#!/bin/bash
#SBATCH --account=[YOUR_ACCOUNT]
#SBATCH --partition=large
#SBATCH --job-name=multinode_container
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=1
#SBATCH --gpus-per-node=8
#SBATCH --cpus-per-task=224
#SBATCH --time=24:00:00
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err
# Use your custom container
srun --container-writable \
--container-remap-root \
--no-container-mount-home \
--container-image $HOME/containers/my-custom-container.sqsh \
python3 ...
Best Practices
Resource Planning: Request appropriate time limits for batch jobs (can be longer than interactive limits)
Output Files: Use descriptive output file names with
%x(job name) and%j(job ID) placeholdersContainer Storage: Store containers in
$HOME/containers/for organizationError Handling: Always specify both
--outputand--errorfiles for debugging