How to Debug Segmentation Fault in C/C++/Fortran Applications
Environment
Linux Computing Cluster
Applications written in C/C++/Fortran
Including MPI applications
Issue
Your program crashes with “Segmentation fault (core dumped)” error message, which typically appears as:
[cpu11:3595716:0:3595716] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x150478021008)
==== backtrace (tid:3595716) ====
0 .../lib/libucs.so.0(+0x4821e) [0x15048a54f21e]
1 .../lib/libucs.so.0(+0x473b4) [0x15048a54e3b4]
2 /lib64/libc.so.6(+0x3e730) [0x15048a776730]
3 build/segfault-debug(+0x1eb0) [0x5578f6512eb0]
4 build/segfault-debug(+0x1da1) [0x5578f6512da1]
5 /lib64/libc.so.6(+0x295d0) [0x15048a7615d0]
6 /lib64/libc.so.6(__libc_start_main+0x80) [0x15048a761680]
7 build/segfault-debug(+0x1c65) [0x5578f6512c65]
=================================
srun: error: cpu11: task 1: Segmentation fault (core dumped)
srun: error: cpu11: task 3: Segmentation fault (core dumped)
srun: error: cpu11: task 2: Segmentation fault (core dumped)
Resolution
Segmentation fault errors in C/C++/Fortran applications can be caused by various issues, such as dereferencing null or invalid pointers, accessing memory out of bounds, or stack overflows.
Here are the recommended steps to diagnose and resolve the issue:
Step 1: Reduce Complications
Here are some common issues that can be checked quickly
Check system limits with
ulimit -ato ensure stack size and locked memory size are adequate$ ulimit -a max locked memory (kbytes, -l) unlimited stack size (kbytes, -s) unlimited
Run the application with smaller input or fewer processes to simplify the potential error
Run the application with single node to rule out network-related problems
Recompile the application with lower optimization levels (e.g.,
-O1instead of-O3) to eliminate compiler optimization issues
Step 2: Enable Core Dump Collection
Clean and rebuild your program with debug symbols and lower optimization
Note
The build command needed depends on your application.
$ make clean $ make CFLAGS="-O1 -g" CXXFLAGS="-O1 -g" FFLAGS="-O1 -g"
Request an interactive job for debugging, for example, requesting 4 mpi ranks only
$ srun ... --ntasks-per-node=4 --cpus-per-task=32 --pty bash
Enable core dump generation:
$ ulimit -c unlimited $ ulimit -a core file size (blocks, -c) unlimited
Run your application with overlap mode, your app would reproduce the segmentation fault.
Note
Do not press
ctrl+cto terminate the job immediately, otherwise the coredump file may not be generated.$ srun --overlap ./your_app [cpu11:3595716:0:3595716] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x150478021008) ==== backtrace (tid:3595716) ==== 0 .../lib/libucs.so.0(+0x4821e) [0x15048a54f21e] 1 .../lib/libucs.so.0(+0x473b4) [0x15048a54e3b4] 2 /lib64/libc.so.6(+0x3e730) [0x15048a776730] 3 build/segfault-debug(+0x1eb0) [0x5578f6512eb0] 4 build/segfault-debug(+0x1da1) [0x5578f6512da1] 5 /lib64/libc.so.6(+0x295d0) [0x15048a7615d0] 6 /lib64/libc.so.6(__libc_start_main+0x80) [0x15048a761680] 7 build/segfault-debug(+0x1c65) [0x5578f6512c65] ================================= srun: error: cpu11: task 1: Segmentation fault (core dumped) srun: error: cpu11: task 3: Segmentation fault (core dumped) srun: error: cpu11: task 2: Segmentation fault (core dumped)
Step 3: Retrieve and Analyze Core Dump
Locate your core dump files in the system directory
$ ls -lh /var/lib/systemd/coredump/ -rw-r----- 1 root root 421K Sep 29 10:30 core.your_app.1001.f82f362e60834791838dad4c3e378781.3595715.1759132168000000.zst -rw-r----- 1 root root 419K Sep 29 10:30 core.your_app.1001.f82f362e60834791838dad4c3e378781.3595716.1759132168000000.zst -rw-r----- 1 root root 419K Sep 29 10:30 core.your_app.1001.f82f362e60834791838dad4c3e378781.3595717.1759132168000000.zst
Decompress one of the core dumps to your working directory
$ zstd -d /var/lib/systemd/coredump/core.your_app.......zst -o core.your_app
Load the core dump into GDB for analysis
$ gdb ./your_application core.your_app
Step 4: Debug with GDB
Here showcase some common gdb commands to analyze the core dump.
Get the backtrace to see the function call stack at the crash point
(gdb) bt #0 initialize_array (array=0x153635fff010, rank=3) at src/mpi_impl.c:14 #1 process_array_and_calculate_sum (rank=3) at src/mpi_impl.c:27 #2 0x000055c955b2cda1 in main (argc=1, argv=0x7ffc29621628) at src/main.c:21
Examine the source code around the crash location
(gdb) list 9 const unsigned long base = (unsigned long)rank * PER_RANK_ARRAY_SIZE; 10 for (int i = 0; i < PER_RANK_ARRAY_SIZE; i++) 11 { 12 // BUG: should be array[i] = base + i; 13 // This causes segmentation fault when rank > 1 14 array[i * (rank + 1)] = base + i; 15 } 16 }
Inspect variable values at the time of crash
(gdb) print i $1 = <optimized out> (gdb) print rank $2 = 3 (gdb) print array $3 = (unsigned long *) 0x153635fff010
Note
If variables show
<optimized out>, rebuild with-O0 -gflags for complete debugging information.
Root Cause
Common causes of segmentation faults include:
Null pointer dereference: Accessing memory through uninitialized or null pointers
Buffer overflow: Writing beyond allocated memory boundaries
Use after free: Accessing memory that has been deallocated
Stack overflow: Excessive recursion or large local variables exceeding stack limits
Invalid memory access: Accessing memory outside the program’s address space
Try It Yourself
To practice the debugging techniques described in this guide, you can work with our example segmentation fault program:
Example Repository: https://github.com/hkust-hpc-team/hkust-hpc/tree/main/examples/debug-segmentation-fault/
This example includes:
A simple MPI C program that intentionally contains a segmentation fault bug
Build instructions and compilation flags for debugging
Clone the repository and follow the instructions to reproduce the segmentation fault, then use the techniques from this guide to identify and fix the bug.