How to Submit and Run Batch Jobs with SLURM

Last updated: 2025-12-04
Solution verified: 2025-12-04

Environment

  • HPC4 cluster

  • Superpod cluster

  • SLURM workload manager

  • Any batch computational task (simulations, data processing, training, etc.)

Issue

  • Users need to run computational tasks that don’t require interactive input

  • Jobs should run unattended in the background, possibly for extended periods

  • Resources need to be scheduled and allocated fairly among all users

  • Users want to submit multiple jobs and have them queue automatically

  • Need to run jobs on specific partitions (CPU, GPU) with defined resource requirements

Resolution

Use the sbatch command to submit batch job scripts to SLURM. Batch scripts contain resource requirements and the commands to execute.

Basic Batch Job Workflow

  1. Create a batch script with resource requirements and commands

  2. Submit the script using sbatch

  3. Monitor job status with squeue

  4. Retrieve results from output files after job completes

Creating a Batch Script

A batch script is a shell script with special SLURM directives (#SBATCH) that specify resource requirements.

Basic CPU Job (HPC4)

#!/bin/bash
#SBATCH --job-name=my_cpu_job
#SBATCH --account=exampleproj
#SBATCH --partition=amd
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=32
#SBATCH --time=24:00:00
#SBATCH --output=job_%j.out
#SBATCH --error=job_%j.err

# Load required modules
module load python/3.12

# Run your application
python my_script.py

GPU Job (Superpod)

#!/bin/bash
#SBATCH --job-name=my_gpu_job
#SBATCH --account=exampleproj
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=32
#SBATCH --gpus-per-task=1
#SBATCH --time=2-00:00:00
#SBATCH --output=gpu_job_%j.out
#SBATCH --error=gpu_job_%j.err

# Load CUDA and other modules
module load cuda/12.6
module load python/3.12

# Run GPU application
python train_model.py

MPI Parallel Job

#!/bin/bash
#SBATCH --job-name=mpi_job
#SBATCH --account=exampleproj
#SBATCH --partition=amd
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=64
#SBATCH --time=12:00:00
#SBATCH --output=mpi_%j.out
#SBATCH --error=mpi_%j.err

# Load compiler and MPI
module load intel-oneapi-compilers/2025
module load intel-oneapi-mpi/2021

# Run MPI application
srun ./my_mpi_program

Common SBATCH Directives

Directive

Description

--job-name=<name>

Name for the job (shows in queue)

--account=<project>

Project account to charge (required)

--partition=<name>

Partition to use (amd, intel, gpu, etc.)

--nodes=<n>

Number of nodes to allocate

--ntasks-per-node=<n>

Number of tasks (MPI ranks) per node

--cpus-per-task=<n>

Number of CPU cores per task

--gpus-per-task=<n>

Number of GPUs per task (GPU partitions)

--time=<D-HH:MM:SS>

Maximum wall time (days-hours:minutes:seconds)

--output=<file>

File for standard output (%j = job ID)

--error=<file>

File for standard error (%j = job ID)

--mail-type=<events>

Email notification events (BEGIN, END, FAIL, ALL)

--mail-user=<email>

Email address for notifications

Important

Do not specify --mem or --mem-per-cpu options. Memory is automatically allocated proportionally based on the number of CPUs or GPUs requested.

Submitting Jobs

# Submit a batch job
sbatch my_job_script.sh

# Output shows job ID
# Submitted batch job 12345

The job is now queued and will start when resources are available.

Monitoring Jobs

Check Job Status

# View your jobs in the queue
squeue -u $USER

# View specific job details
squeue -j 12345

# View all jobs on a partition
squeue -p amd

Example squeue output:

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
12345       amd  my_job  username  R      10:23      2 node[001-002]
12346       gpu gpu_job  username PD       0:00      1 (Resources)

Job states: R (Running), PD (Pending), CG (Completing), F (Failed)

View Job Details

# Detailed job information
scontrol show job 12345

# Job accounting information
sacct -j 12345 --format=JobID,JobName,Partition,State,Elapsed,MaxRSS

Canceling Jobs

# Cancel a specific job
scancel 12345

# Cancel all your jobs
scancel -u $USER

# Cancel all your jobs in a partition
scancel -u $USER -p amd

Job Output Files

Output files are created in the directory where you submitted the job (unless you specify absolute paths).

# View output while job is running
tail -f job_12345.out

# Check for errors
cat job_12345.err

# View completed job output
less job_12345.out

Use %j in filenames to include the job ID automatically:

#SBATCH --output=results_%j.out
#SBATCH --error=errors_%j.err

Array Jobs

For running multiple similar jobs with different parameters or input files, see the dedicated array jobs guide:

How to Use SLURM Array Jobs for Parameter Sweeps and Batch Processing

Job Dependencies

Chain jobs to run sequentially:

# Submit first job
JOB1=$(sbatch --parsable first_job.sh)

# Submit second job that depends on first
sbatch --dependency=afterok:$JOB1 second_job.sh

# Or depend on successful completion
sbatch --dependency=afterok:$JOB1 analysis_job.sh

Dependency types: - afterok:jobid - Start after job completes successfully - afterany:jobid - Start after job completes (any state) - afternotok:jobid - Start only if job fails

Best Practices

Resource Requests

  • Request only the resources you need

  • Use --time wisely - jobs with shorter time limits may start sooner

  • Test with small jobs before scaling up

  • Monitor resource usage to optimize future requests

Script Organization

  • Use descriptive job names

  • Include comments explaining what the job does

  • Set up proper output/error file naming

  • Load all required modules at the start

Error Handling

Output Management

  • Use unique output filenames with %j (job ID)

  • Organize outputs in subdirectories for large job sets

  • Clean up old output files periodically

  • Consider redirecting verbose output to /dev/null

Root Cause

Batch job systems exist because:

Shared Resource Management - Compute clusters are shared among many users - Fair scheduling ensures everyone gets their allocated share - Queue system prevents resource conflicts

Unattended Execution - Jobs are not affected by login node reboot or network disconnection - Jobs can run for extended periods over days - Failed jobs can be automatically requeued - Long-running jobs don’t need interactive supervision

Resource Optimization - Scheduler can pack jobs efficiently across nodes - Automatic resource allocation based on requirements - Better overall cluster utilization

References

Example Scripts

Related Articles

SLURM Documentation