How to Efficiently Upload Files to HPC Cluster
Environment
macOS, Linux/Unix, or Windows with WSL
SSH access to HPC cluster
Issue
How can I efficiently transfer files to HPC clusters:
Upload large datasets (GB/TB scale) to HPC cluster efficiently
Upload datasets with resume capability for interrupted transfers
Upload many small files quickly (thousands of files)
Upload files in parallel to maximize bandwidth utilization
Monitor transfer progress and handle errors gracefully
Choose the best transfer method for different scenarios
Resolution
Use fpsync for parallel file transfers, which significantly improves transfer speed compared to traditional methods.
Installation
Install fpart package on your local machine:
# Ubuntu/Debian:
$ sudo apt install fpart
# macOS:
$ brew install fpart
# CentOS/RHEL:
$ sudo yum install fpart
Basic Usage
Transfer a directory to cluster:
$ fpsync -n 8 ~/local_directory username@hpc.university.edu:~/remote_directory
Transfer with specific options:
$ fpsync -n 8 -v -x -o "-a" ~/local_directory username@hpc.university.edu:~/remote_directory
- Options Explained:
-n 8: Use 8 parallel transfer processes-v: Verbose output-x: Cross filesystem boundaries-o "-a": Pass rsync archive option
Note
Choose number of parallel processes (-n) based on your network connection and system capabilities
Warning
Large number of parallel processes may overload the network or system
Always test with small directories first
Root Cause
Traditional file transfer tools process files sequentially. When transferring many small files, the overhead of establishing connections and handshaking for each file becomes significant. Parallel transfer tools like fpsync divide the workload among multiple processes, utilizing available bandwidth more efficiently.