How to Efficiently Remove Large Number of Files
Environment
Linux/Unix systems
NFS Storage
Issue
When dealing with large directories containing numerous files or subdirectories, standard removal commands like rm -rf or conda env remove can be extremely slow.
This is particularly noticeable when
Removing conda environments
Deleting virtual environments
Cleaning up directories with many small files
Removing large datasets with numerous files
Resolution
Use parallel file deletion to significantly speed up the removal process
nohup bash -c "fd -uua0 --one-file-system . /path/to/delete | xargs -r0 -P $(nproc) -n 128 rm -rf" &
Note
For Ubuntu systems, the command is fdfind instead of fd. You may alias fdfind to fd for
compatibility if needed.
Command Details
nohup ... &
Runs the command in background, continues even if terminal closes
fdfind flags
-uu: Unrestricted search (includes hidden files)-a0: Print absolute paths, null-terminated output--one-file-system: Stay within the same filesystem
xargs flags:
-r: Don’t run command if input is empty-0: Input items are terminated by null character-P $(nproc): Run up to number-of-CPUs processes in parallel-n 128: Use at most 128 arguments per command line
Note
Running deletion in parallel can significantly impact I/O performance. Consider running during off-peak hours for large deletions.
Warning
Double-check the target directory path before execution - this operation cannot be undone.
Root Cause
Sequential file deletion becomes inefficient when dealing with large numbers of files. There are several contributing factors:
File system metadata updates for each deletion
Single-threaded operation in standard removal commands
Directory entry updates
Inode management overhead
By parallelizing the deletion process and using efficient file finding, we can significantly reduce the total time required for bulk file removal.
References
fdhelp or manual:fd --helporman fdfdGithub Repository: https://github.com/sharkdp/fdxargshelp or manual:xargs --helporman xargs