How to Submit and Run Batch Jobs with SLURM

Environment

HPC4 cluster

Superpod cluster

SLURM workload manager

Any batch computational task (simulations, data processing, training, etc.)

Issue

Users need to run computational tasks that don’t require interactive input

Jobs should run unattended in the background, possibly for extended periods

Resources need to be scheduled and allocated fairly among all users

Users want to submit multiple jobs and have them queue automatically

Need to run jobs on specific partitions (CPU, GPU) with defined resource requirements

Resolution

Use the sbatch command to submit batch job scripts to SLURM. Batch scripts contain resource requirements and the commands to execute.

Basic Batch Job Workflow

Create a batch script with resource requirements and commands
Submit the script using sbatch
Monitor job status with squeue
Retrieve results from output files after job completes

Creating a Batch Script

A batch script is a shell script with special SLURM directives (#SBATCH) that specify resource requirements.

Basic CPU Job (HPC4)

#!/bin/bash
#SBATCH --job-name=my_cpu_job
#SBATCH --account=exampleproj
#SBATCH --partition=amd
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=32
#SBATCH --time=24:00:00
#SBATCH --output=job_%j.out
#SBATCH --error=job_%j.err

# Load required modules
module load python/3.12

# Run your application
python my_script.py

GPU Job (Superpod)

#!/bin/bash
#SBATCH --job-name=my_gpu_job
#SBATCH --account=exampleproj
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=32
#SBATCH --gpus-per-task=1
#SBATCH --time=2-00:00:00
#SBATCH --output=gpu_job_%j.out
#SBATCH --error=gpu_job_%j.err

# Load CUDA and other modules
module load cuda/12.6
module load python/3.12

# Run GPU application
python train_model.py

MPI Parallel Job

#!/bin/bash
#SBATCH --job-name=mpi_job
#SBATCH --account=exampleproj
#SBATCH --partition=amd
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=64
#SBATCH --time=12:00:00
#SBATCH --output=mpi_%j.out
#SBATCH --error=mpi_%j.err

# Load compiler and MPI
module load intel-oneapi-compilers/2025
module load intel-oneapi-mpi/2021

# Run MPI application
srun ./my_mpi_program

Common SBATCH Directives

Directive	Description
`--job-name=<name>`	Name for the job (shows in queue)
`--account=<project>`	Project account to charge (required)
`--partition=<name>`	Partition to use (amd, intel, gpu, etc.)
`--nodes=<n>`	Number of nodes to allocate
`--ntasks-per-node=<n>`	Number of tasks (MPI ranks) per node
`--cpus-per-task=<n>`	Number of CPU cores per task
`--gpus-per-task=<n>`	Number of GPUs per task (GPU partitions)
`--time=<D-HH:MM:SS>`	Maximum wall time (days-hours:minutes:seconds)
`--output=<file>`	File for standard output (`%j` = job ID)
`--error=<file>`	File for standard error (`%j` = job ID)
`--mail-type=<events>`	Email notification events (BEGIN, END, FAIL, ALL)
`--mail-user=<email>`	Email address for notifications

Important

Do not specify --mem or --mem-per-cpu options. Memory is automatically allocated proportionally based on the number of CPUs or GPUs requested.

Submitting Jobs

# Submit a batch job
sbatch my_job_script.sh

# Output shows job ID
# Submitted batch job 12345

The job is now queued and will start when resources are available.

Monitoring Jobs

Check Job Status

# View your jobs in the queue
squeue -u $USER

# View specific job details
squeue -j 12345

# View all jobs on a partition
squeue -p amd

Example squeue output:

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
12345       amd  my_job  username  R      10:23      2 node[001-002]
12346       gpu gpu_job  username PD       0:00      1 (Resources)

Job states: R (Running), PD (Pending), CG (Completing), F (Failed)

View Job Details

# Detailed job information
scontrol show job 12345

# Job accounting information
sacct -j 12345 --format=JobID,JobName,Partition,State,Elapsed,MaxRSS

Canceling Jobs

# Cancel a specific job
scancel 12345

# Cancel all your jobs
scancel -u $USER

# Cancel all your jobs in a partition
scancel -u $USER -p amd

Job Output Files

Output files are created in the directory where you submitted the job (unless you specify absolute paths).

# View output while job is running
tail -f job_12345.out

# Check for errors
cat job_12345.err

# View completed job output
less job_12345.out

Use %j in filenames to include the job ID automatically:

#SBATCH --output=results_%j.out
#SBATCH --error=errors_%j.err

Array Jobs

For running multiple similar jobs with different parameters or input files, see the dedicated array jobs guide:

How to Use SLURM Array Jobs for Parameter Sweeps and Batch Processing

Job Dependencies

Chain jobs to run sequentially:

# Submit first job
JOB1=$(sbatch --parsable first_job.sh)

# Submit second job that depends on first
sbatch --dependency=afterok:$JOB1 second_job.sh

# Or depend on successful completion
sbatch --dependency=afterok:$JOB1 analysis_job.sh

Dependency types: - afterok:jobid - Start after job completes successfully - afterany:jobid - Start after job completes (any state) - afternotok:jobid - Start only if job fails

Best Practices

Resource Requests

Request only the resources you need
Use --time wisely - jobs with shorter time limits may start sooner
Test with small jobs before scaling up
Monitor resource usage to optimize future requests

Script Organization

Use descriptive job names
Include comments explaining what the job does
Set up proper output/error file naming
Load all required modules at the start

Error Handling

Check exit codes in your scripts
Use set -e to exit on errors
Redirect errors to separate log files
Test scripts interactively first (see How to Request Interactive Sessions on Compute Nodes)

Output Management

Use unique output filenames with %j (job ID)
Organize outputs in subdirectories for large job sets
Clean up old output files periodically
Consider redirecting verbose output to /dev/null

Root Cause

Batch job systems exist because:

Shared Resource Management - Compute clusters are shared among many users - Fair scheduling ensures everyone gets their allocated share - Queue system prevents resource conflicts

Unattended Execution - Jobs are not affected by login node reboot or network disconnection - Jobs can run for extended periods over days - Failed jobs can be automatically requeued - Long-running jobs don’t need interactive supervision

Resource Optimization - Scheduler can pack jobs efficiently across nodes - Automatic resource allocation based on requirements - Better overall cluster utilization

References

Example Scripts

Related Articles

How to Request Interactive Sessions on Compute Nodes - For interactive development and testing

SLURM Documentation