Fast Tar And Rsync Transfer Speed For Linux Backups Using Zstd Compression

Newer Tar 1.32+ and Rsync 3.2.3 versions have added Facebook’s zstd compression algorithm and Rsync has added lz4 and xxHash checksum algorithms which give Tar and Rsync a tremendous boost in transfer speed. I’ve switched most of my own personal backup scripts to using these newer Tar 1.32+ and Rsync 3.2.3 versions now.

Facebook’s zstd compression is so flexible and has many levels of compression so you can pick between compression/decompression speed versus compression ratio/file sizes. Zstd can be configured to compress files at near hard drive and network line rates if you sacrifice compression ratios and still end up transfering compressed files as fast as no compression transfers! Both zstd (and lz4) and xxHash are developed and maintained by Facebook’s Yan Collet (Twitter).

I recently migrated hundreds of gigabytes of data for a Centmin Mod LEMP stack based client moving servers and web hosts and using Tar 1.32+ and Rsync 3.2.3 and leveraging Facebook’s zstd compression and Rsync’s xxHash checksum support, the data transfer speed was an eye opener. Especially, when the tested network transfer speed between the servers from USA East Coast to Mid USA was ~40-50MB/s over a 1Gbps network connection due to network and geographical distance.

Over an SSH encrypted and zstd compressed netcat connections, I managed to transfer:

  • 144GB of file data (uncompressed size) in ~21.8 minutes with Tar + zstd and
  • 65GB of MariaDB 10 MySQL data (uncompressed size) in ~8.8 minutes with MariaBackup!

That’s an equivalent for files transferred at 112.73MB/s for Tar + zstd and MariaDB MySQL data transferred at 126MB/s  !

Client Old server

  • Dual Xeon Silver 4116 Skylake 24C/48T – 2.10Ghz base clock and 3.00Ghz max Turbo Boost speed
  • 256GB Memory
  • 4x 2TB SSD Raid 10
  • 1Gbps Network
  • CentOS 7.9 64bit Linux running Centmin Mod 123.09beta01 LEMP stack
  • New York/New Jersey, USA
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz
Stepping: 4
CPU MHz: 800.061
CPU max MHz: 3000.0000
CPU min MHz: 800.0000
BogoMIPS: 4200.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 16896K
NUMA node0 CPU(s): 0-11,24-35
NUMA node1 CPU(s): 12-23,36-47
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear spec_ctrl intel_stibp flush_l1d

Client New server

  • AMD EPYC 7452 Rome 32C/64T – 2.35Ghz base clock and 3.35Ghz max Turbo speed
  • 128GB Memory
  • 2x 960GB NVMe Raid 1
  • 1Gbps Network
  • CentOS 7.9 64bit Linux running Centmin Mod 123.09beta01 LEMP stack
  • Dallas, USA
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7452 32-Core Processor
Stepping: 0
CPU MHz: 2350.000
CPU max MHz: 2350.0000
CPU min MHz: 1500.0000
BogoMIPS: 4699.83
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 16384K
NUMA node0 CPU(s): 0-63
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 cpb cat_l3 cdp_l3 hw_pstate sme retpoline_amd ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip overflow_recov succor smca

While I can’t reveal my client’s specific details, you can read my Tar 1.31+ & Rsync 3.23+ benchmark tests for my custom CentOS 7 RPM built binaries for Tar 1.32 with zstd compression support and Rsync 3.2.3 with zstd, lz4 compression and xxHash checksum support. And round 4 benchmarks for zstd vs pigz vs brotli vs bzip2 vs pxz etc. Seriously, it’s a game changer for potential data file transfer speeds!

Rsync 3.2.3 Benchmarks

Here’s Rsync 3.2.3 with zstd compression fast negative levels for -30, -60, -150, -350 and -8000 to showcase how flexible zstd is in allowing you to choose speed versus compression ratio/sizes. For rsync 3.2.3 zstd there are compression levels from -131072 to 22 and a choice between newer xxhash vs traditional md5 checksum algorithms.

Rsync 3.2.3 zstd benchmarks

Tar 1.31+ with zstd compression

I wrote a quick tar-zstd-test.sh script to test out my tar 1.31 custom RPM which has native zstd compression support and ran the script against uncompressed silesia corpus to compare tar 1.31 pairing with gzip, pigz, zstd, xz, pxz & lz4 compression algorithms. I have since added pigz level 11 for zopfli compression and added all 32 levels of compression for zstd – from negative levels (-10 to -1) which focus on compression speed, normal levels 1 to 19 and then the 3 ultra levels 20 to 22 which focus on better compression ratios.

Tar 1.31+ zstd benchmarks

Fast Tar And Rsync Transfer Speed For Linux Backups Using Zstd Compression

Tar 1.31+ zstd benchmarks

zstd Compression And Decompression Benchmarks

From my round 4 zstd compression and decompression benchmarks. Testing the following compression algorithms:

  • zstd v1.4.4 – Facebook developed realtime compression algorithm here. Run with multi-threaded mode.
  • brotli v1.0.7 – Google developed Brotli compression algorithm
  • gzip v1.5
  • bzip2 v1.06
  • pigz v2.4 – multi-threaded version of gzip
  • pbzip2 v1.1.13 – multi-threaded version of bzip2
  • lbzip2 v2.5– multi-threaded version of bzip2
  • lzip v1.21 – based on LZMA compression algorithm
  • plzip v1.8 – multi-threaded version of lzip
  • xz v5.2.2
  • pxz v5.2.2 – multi-threaded version of xz

zstd compression and decompression benchmarks Fast Tar And Rsync Transfer Speed For Linux Backups Using Zstd Compression Fast Tar And Rsync Transfer Speed For Linux Backups Using Zstd Compression Fast Tar And Rsync Transfer Speed For Linux Backups Using Zstd Compression

If you think about it, compression algorithms are the key to the modern internet. Facebook’s zstd compression algorithm is making its way into many Linux and FreeBSD tools and software outlined on the official zstd site. You can even use zstd compression for Linux logrotation. I’m looking forward to utilising even more zstd and newer Tar and Rsync versions to speed up my data backup and transfer routines. Each new version of zstd is improving performance!