Saving Enroot container failed

Last updated: 2025-02-12
Solution under review

Environment

  • Slurm workload manager

  • Enroot container runtime

  • NVIDIA GPU compute nodes

  • Container-enabled cluster

Issue

When using pyxis/enroot container, saving fails with errors such as

  • slurmstepd: error: pyxis: [ERROR] No such file or directory: /home/username/example/nvhpc:24.3.sqsh

  • slurmstepd: error: pyxis: failed to export container pyxis_174632.0 to /home/username/example/nvhpc:24.3.sqsh

Resolution

  1. Create container directory:

$ mkdir -p $HOME/containers
  1. Run container with correct save path:

$ srun --account=YOUR_ACCOUNT \
    --nodes=1 \
    --gpus-per-node=1 \
    --container-writable \
    --container-save $HOME/containers/nvhpc.sqsh \
    --container-image nvcr.io#nvidia/nvhpc:24.3-devel-cuda12.3-ubuntu22.04 \
    --pty bash

Warning

  • Ensure sufficient disk quota before saving large containers

  • Container names should not contain special characters

  1. Verify saved container:

$ ls -l $HOME/containers/nvhpc.sqsh

Note

  • Parent directory must exist before running container

  • Use absolute paths for –container-save

  • Saved container can be used with –container-image /path/to/container.sqsh

Root Cause

Export can fail when

  • Target directory doesn’t exist

  • Path contains illegal characters

  • Insufficient permissions or disk space / quota

References