How to Use SLURM Array Jobs for Parameter Sweeps and Batch Processing
Environment
HPC4 cluster
Superpod cluster
SLURM workload manager
Tasks that need to run with different parameters or input files
Parameter sweeps, sensitivity analysis, batch data processing
Issue
Need to run the same program with different parameters or input files
Want to process multiple datasets with the same analysis script
Conducting parameter sweeps for optimization or sensitivity analysis
Have many independent tasks that can run in parallel
Submitting individual jobs for each parameter is time-consuming and error-prone
Need efficient way to manage hundreds or thousands of similar jobs
Resolution
Use SLURM array jobs with the --array option to submit multiple similar tasks with a single command. Each array task gets a unique $SLURM_ARRAY_TASK_ID that can be used to select parameters or input files.
Basic Array Job Syntax
#!/bin/bash
#SBATCH --job-name=array_example
#SBATCH --account=exampleproj
#SBATCH --partition=amd
#SBATCH --array=1-10
#SBATCH --output=array_%A_%a.out
#SBATCH --error=array_%A_%a.err
# $SLURM_ARRAY_TASK_ID contains the array index (1, 2, 3, ..., 10)
echo "This is array task $SLURM_ARRAY_TASK_ID"
# Use the task ID in your computation
python process.py --task-id $SLURM_ARRAY_TASK_ID
This submits 10 jobs (array tasks) with indices 1 through 10.
Array Job Environment Variables
Variable |
Description |
|---|---|
|
Current array task index (e.g., 1, 2, 3, …) |
|
Job ID of the entire array (same for all tasks) |
|
Minimum array index |
|
Maximum array index |
|
Total number of array tasks |
Use Case 1: Processing Multiple Input Files
Process different data files using the array task ID to select the file.
Method 1: Numbered Files
#!/bin/bash
#SBATCH --job-name=process_data
#SBATCH --account=exampleproj
#SBATCH --partition=amd
#SBATCH --array=1-100
#SBATCH --output=logs/job_%A_%a.out
# Process file based on array task ID
# i.e. data/input_1.txt, data/input_2.txt, ...
INPUT_FILE="data/input_${SLURM_ARRAY_TASK_ID}.txt"
OUTPUT_FILE="results/output_${SLURM_ARRAY_TASK_ID}.txt"
python analyze.py --input $INPUT_FILE --output $OUTPUT_FILE
Method 2: File List
Read filenames from a list and select based on array task ID.
#!/bin/bash
#SBATCH --job-name=file_processing
#SBATCH --account=exampleproj
#SBATCH --partition=amd
#SBATCH --array=1-50
#SBATCH --output=logs/job_%A_%a.out
# Get the filename from a list
FILE_LIST="file_list.txt"
INPUT_FILE=$(sed -n "${SLURM_ARRAY_TASK_ID}p" $FILE_LIST)
# Process the file
echo "Processing: $INPUT_FILE"
python process.py $INPUT_FILE
Example file_list.txt:
/path/to/dataset1.dat
/path/to/dataset2.dat
/path/to/dataset3.dat
...
Use Case 2: Parameter Sweeps
Run simulations or analyses with different parameter values.
Simple Parameter Mapping
#!/bin/bash
#SBATCH --job-name=param_sweep
#SBATCH --account=exampleproj
#SBATCH --partition=amd
#SBATCH --array=0-99
#SBATCH --output=logs/param_%A_%a.out
# Map array task ID to parameter values
# Example: sweep learning rate from 0.001 to 0.1
LEARNING_RATE=$(awk "BEGIN {print 0.001 + $SLURM_ARRAY_TASK_ID * 0.001}")
echo "Running with learning rate: $LEARNING_RATE"
python train_model.py --lr $LEARNING_RATE
Multi-Dimensional Parameter Grid
Sweep multiple parameters simultaneously.
#!/bin/bash
#SBATCH --job-name=grid_search
#SBATCH --account=exampleproj
#SBATCH --partition=gpu
#SBATCH --gpus-per-task=1
#SBATCH --array=0-99
#SBATCH --output=logs/grid_%A_%a.out
# Define parameter grid
# 10 learning rates × 10 batch sizes = 100 combinations
LR_VALUES=(0.001 0.002 0.005 0.01 0.02 0.05 0.1 0.2 0.5 1.0)
BATCH_VALUES=(16 32 64 128 256 512 1024 2048 4096 8192)
# Calculate indices
LR_IDX=$((SLURM_ARRAY_TASK_ID / 10))
BATCH_IDX=$((SLURM_ARRAY_TASK_ID % 10))
# Get parameter values
LR=${LR_VALUES[$LR_IDX]}
BATCH=${BATCH_VALUES[$BATCH_IDX]}
echo "Learning Rate: $LR, Batch Size: $BATCH"
python train.py --lr $LR --batch-size $BATCH
Using Parameter File
Read parameter combinations from a file.
#!/bin/bash
#SBATCH --job-name=param_file
#SBATCH --account=exampleproj
#SBATCH --partition=amd
#SBATCH --array=1-100
#SBATCH --output=logs/param_%A_%a.out
# Read parameters from file (one combination per line)
PARAMS=$(sed -n "${SLURM_ARRAY_TASK_ID}p" parameters.txt)
# Parse parameters (assuming space-separated)
read -r ALPHA BETA GAMMA <<< "$PARAMS"
echo "Running with α=$ALPHA, β=$BETA, γ=$GAMMA"
./simulation --alpha $ALPHA --beta $BETA --gamma $GAMMA
Example parameters.txt:
0.1 0.5 1.0
0.1 0.5 2.0
0.1 1.0 1.0
0.2 0.5 1.0
...
Use Case 3: Processing Folders
Process data in different directories.
#!/bin/bash
#SBATCH --job-name=folder_processing
#SBATCH --account=exampleproj
#SBATCH --partition=amd
#SBATCH --array=1-20
#SBATCH --output=logs/folder_%A_%a.out
# Define folder pattern
FOLDER_PREFIX="/data/experiment"
FOLDER="${FOLDER_PREFIX}_${SLURM_ARRAY_TASK_ID}"
# Check if folder exists
if [ -d "$FOLDER" ]; then
echo "Processing folder: $FOLDER"
cd $FOLDER
python ../analysis.py
else
echo "Warning: Folder $FOLDER does not exist"
exit 1
fi
Array Job Array Specifications
Different ways to specify array indices:
# Range: tasks 1, 2, 3, ..., 100
#SBATCH --array=1-100
# Range with step: tasks 0, 10, 20, ..., 100
#SBATCH --array=0-100:10
# Specific values: tasks 1, 5, 10, 15
#SBATCH --array=1,5,10,15
# Mixed: tasks 1, 2, 3, 4, 5, 10, 20, 30
#SBATCH --array=1-5,10,20,30
# Limit concurrent tasks: max 10 running at once
#SBATCH --array=1-1000%10
Tip
Use % to limit concurrent array tasks. This prevents overwhelming the system with too many simultaneous jobs while still allowing all tasks to queue.
Managing Array Jobs
Monitoring Array Jobs
# View all array tasks
squeue -u $USER
# View specific array job
squeue -j 12345
# Count running/pending tasks
squeue -u $USER --array -t RUNNING | wc -l
squeue -u $USER --array -t PENDING | wc -l
Canceling Array Tasks
# Cancel entire array job
scancel 12345
# Cancel specific array task
scancel 12345_5
# Cancel range of array tasks
scancel 12345_[10-20]
# Cancel all array tasks with specific job name
scancel --name=array_job
Output File Naming
Use special placeholders in output filenames:
# %A = array job ID (same for all tasks)
# %a = array task ID (unique for each task)
#SBATCH --output=results_%A_%a.out
#SBATCH --error=errors_%A_%a.err
# Organize outputs in subdirectories
#SBATCH --output=logs/task_%a/output.log
#SBATCH --error=logs/task_%a/error.log
Best Practices
Array Job Design
Make each array task independent - no dependencies between tasks
Ensure all tasks have similar resource requirements
Use
--array=1-N%Mto limit concurrent tasks and avoid overwhelming the schedulerTest with a small array (e.g.,
--array=1-3) before scaling up
Resource Management
Request resources per task, not for the entire array
Consider task runtime - all tasks should finish in similar time
Use appropriate concurrency limits based on cluster policy
Data Management
Use unique output filenames with
%A_%ato avoid conflictsCreate output directories before submitting if needed
Consider using task-specific working directories
Clean up intermediate files from completed tasks
Error Handling
Include error checking in your script
Log which parameter combination or file each task processes
Failed tasks can be identified and resubmitted individually
Use
set -eto exit on errors
Debugging
Test with
--array=1or--array=1-3firstCheck one output file to verify correctness
Use explicit echo statements to log task ID and parameters
Verify file/parameter selection logic works correctly
Advanced Techniques
Dynamic Array Size from File Count
# Count files and create array job
shopt -s nullglob
files=(data/*.txt)
NUM_FILES=${#files[@]}
sbatch --array=1-$NUM_FILES process_files.sh
Resubmitting Failed Tasks
# Find failed tasks from sacct
# Find failed tasks from sacct
JOB_ID=12345
sacct -j $JOB_ID --format=JobID,State | grep FAILED | awk '{print $1}' > failed_tasks.txt
# Create array specification from failed tasks
FAILED_ARRAY=$(sed "s/${JOB_ID}_//" failed_tasks.txt | tr '\n' ',' | sed 's/,$//')
# Resubmit only failed tasks
sbatch --array=$FAILED_ARRAY rerun_job.sh
Root Cause
Array jobs solve the problem of submitting and managing large numbers of similar tasks:
Without Array Jobs: - Need to write loops to submit hundreds of individual jobs - Job IDs are unrelated, making management difficult - Output files need manual naming conventions - Monitoring and canceling groups of related jobs is tedious
With Array Jobs:
- Single submission for all related tasks
- Automatic task indexing with $SLURM_ARRAY_TASK_ID
- Unified job ID for the entire array
- Easy monitoring and cancellation of task groups
- Built-in output file naming with %A_%a
- Scheduler can optimize resource allocation for task groups
References
Related Articles
How to Submit and Run Batch Jobs with SLURM - Basic batch job submission
How to Request Interactive Sessions on Compute Nodes - Interactive testing before array submission
SLURM Documentation