How to Submit and Run Batch Jobs with SLURM ============================================ .. meta:: :description: Guide to submitting and running batch jobs on HPC clusters using SLURM workload manager :keywords: slurm, batch job, sbatch, job submission, hpc, queue :author: HPC Support Team .. rst-class:: header | Last updated: 2025-12-04 | Solution verified: 2025-12-04 Environment ----------- - HPC4 cluster - Superpod cluster - SLURM workload manager - Any batch computational task (simulations, data processing, training, etc.) Issue ----- - Users need to run computational tasks that don't require interactive input - Jobs should run unattended in the background, possibly for extended periods - Resources need to be scheduled and allocated fairly among all users - Users want to submit multiple jobs and have them queue automatically - Need to run jobs on specific partitions (CPU, GPU) with defined resource requirements Resolution ---------- Use the ``sbatch`` command to submit batch job scripts to SLURM. Batch scripts contain resource requirements and the commands to execute. Basic Batch Job Workflow ~~~~~~~~~~~~~~~~~~~~~~~~~ 1. Create a batch script with resource requirements and commands 2. Submit the script using ``sbatch`` 3. Monitor job status with ``squeue`` 4. Retrieve results from output files after job completes Creating a Batch Script ~~~~~~~~~~~~~~~~~~~~~~~ A batch script is a shell script with special SLURM directives (``#SBATCH``) that specify resource requirements. Basic CPU Job (HPC4) ^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash #!/bin/bash #SBATCH --job-name=my_cpu_job #SBATCH --account=exampleproj #SBATCH --partition=amd #SBATCH --nodes=1 #SBATCH --ntasks-per-node=32 #SBATCH --time=24:00:00 #SBATCH --output=job_%j.out #SBATCH --error=job_%j.err # Load required modules module load python/3.12 # Run your application python my_script.py GPU Job (Superpod) ^^^^^^^^^^^^^^^^^^ .. code-block:: bash #!/bin/bash #SBATCH --job-name=my_gpu_job #SBATCH --account=exampleproj #SBATCH --partition=gpu #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=32 #SBATCH --gpus-per-task=1 #SBATCH --time=2-00:00:00 #SBATCH --output=gpu_job_%j.out #SBATCH --error=gpu_job_%j.err # Load CUDA and other modules module load cuda/12.6 module load python/3.12 # Run GPU application python train_model.py MPI Parallel Job ^^^^^^^^^^^^^^^^ .. code-block:: bash #!/bin/bash #SBATCH --job-name=mpi_job #SBATCH --account=exampleproj #SBATCH --partition=amd #SBATCH --nodes=4 #SBATCH --ntasks-per-node=64 #SBATCH --time=12:00:00 #SBATCH --output=mpi_%j.out #SBATCH --error=mpi_%j.err # Load compiler and MPI module load intel-oneapi-compilers/2025 module load intel-oneapi-mpi/2021 # Run MPI application srun ./my_mpi_program Common SBATCH Directives ~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :widths: 35 65 :header-rows: 1 * - Directive - Description * - ``--job-name=`` - Name for the job (shows in queue) * - ``--account=`` - Project account to charge (required) * - ``--partition=`` - Partition to use (amd, intel, gpu, etc.) * - ``--nodes=`` - Number of nodes to allocate * - ``--ntasks-per-node=`` - Number of tasks (MPI ranks) per node * - ``--cpus-per-task=`` - Number of CPU cores per task * - ``--gpus-per-task=`` - Number of GPUs per task (GPU partitions) * - ``--time=`` - Maximum wall time (days-hours:minutes:seconds) * - ``--output=`` - File for standard output (``%j`` = job ID) * - ``--error=`` - File for standard error (``%j`` = job ID) * - ``--mail-type=`` - Email notification events (BEGIN, END, FAIL, ALL) * - ``--mail-user=`` - Email address for notifications .. important:: **Do not specify** ``--mem`` or ``--mem-per-cpu`` options. Memory is automatically allocated proportionally based on the number of CPUs or GPUs requested. Submitting Jobs ~~~~~~~~~~~~~~~ .. code-block:: bash # Submit a batch job sbatch my_job_script.sh # Output shows job ID # Submitted batch job 12345 The job is now queued and will start when resources are available. Monitoring Jobs ~~~~~~~~~~~~~~~ Check Job Status ^^^^^^^^^^^^^^^^ .. code-block:: bash # View your jobs in the queue squeue -u $USER # View specific job details squeue -j 12345 # View all jobs on a partition squeue -p amd Example ``squeue`` output: .. code-block:: text JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 12345 amd my_job username R 10:23 2 node[001-002] 12346 gpu gpu_job username PD 0:00 1 (Resources) Job states: ``R`` (Running), ``PD`` (Pending), ``CG`` (Completing), ``F`` (Failed) View Job Details ^^^^^^^^^^^^^^^^ .. code-block:: bash # Detailed job information scontrol show job 12345 # Job accounting information sacct -j 12345 --format=JobID,JobName,Partition,State,Elapsed,MaxRSS Canceling Jobs ~~~~~~~~~~~~~~ .. code-block:: bash # Cancel a specific job scancel 12345 # Cancel all your jobs scancel -u $USER # Cancel all your jobs in a partition scancel -u $USER -p amd Job Output Files ~~~~~~~~~~~~~~~~ Output files are created in the directory where you submitted the job (unless you specify absolute paths). .. code-block:: bash # View output while job is running tail -f job_12345.out # Check for errors cat job_12345.err # View completed job output less job_12345.out Use ``%j`` in filenames to include the job ID automatically: .. code-block:: bash #SBATCH --output=results_%j.out #SBATCH --error=errors_%j.err Array Jobs ~~~~~~~~~~ For running multiple similar jobs with different parameters or input files, see the dedicated array jobs guide: :doc:`slurm-how-to-use-slurm-array-jobs-for--sQ5r9U` Job Dependencies ~~~~~~~~~~~~~~~~ Chain jobs to run sequentially: .. code-block:: bash # Submit first job JOB1=$(sbatch --parsable first_job.sh) # Submit second job that depends on first sbatch --dependency=afterok:$JOB1 second_job.sh # Or depend on successful completion sbatch --dependency=afterok:$JOB1 analysis_job.sh Dependency types: - ``afterok:jobid`` - Start after job completes successfully - ``afterany:jobid`` - Start after job completes (any state) - ``afternotok:jobid`` - Start only if job fails Best Practices ~~~~~~~~~~~~~~ **Resource Requests** - Request only the resources you need - Use ``--time`` wisely - jobs with shorter time limits may start sooner - Test with small jobs before scaling up - Monitor resource usage to optimize future requests **Script Organization** - Use descriptive job names - Include comments explaining what the job does - Set up proper output/error file naming - Load all required modules at the start **Error Handling** - Check exit codes in your scripts - Use ``set -e`` to exit on errors - Redirect errors to separate log files - Test scripts interactively first (see :doc:`slurm-how-to-request-interactive-sessi-HV7WS9`) **Output Management** - Use unique output filenames with ``%j`` (job ID) - Organize outputs in subdirectories for large job sets - Clean up old output files periodically - Consider redirecting verbose output to /dev/null Root Cause ---------- Batch job systems exist because: **Shared Resource Management** - Compute clusters are shared among many users - Fair scheduling ensures everyone gets their allocated share - Queue system prevents resource conflicts **Unattended Execution** - **Jobs are not affected by login node reboot or network disconnection** - Jobs can run for extended periods over days - Failed jobs can be automatically requeued - Long-running jobs don't need interactive supervision **Resource Optimization** - Scheduler can pack jobs efficiently across nodes - Automatic resource allocation based on requirements - Better overall cluster utilization References ---------- **Example Scripts** - `HPC4 Batch Job Examples `_ - `CPU Batch Job `_ - `CPU MPI Batch Job `_ - `GPU Batch Job `_ - `Superpod Batch Job Examples `_ - `Batch Job `_ **Related Articles** - :doc:`slurm-how-to-request-interactive-sessi-HV7WS9` - For interactive development and testing **SLURM Documentation** - `SLURM sbatch Command `_ - `SLURM squeue Command `_ - `SLURM Job Arrays `_ .. rst-class:: footer **HPC Support Team** | ITSO, HKUST | Email: cchelp@ust.hk | Web: https://itso.hkust.edu.hk/ **Article Info** | Issued: 2025-12-04 | Issued by: kftse@ust.hk