Skip to main content

Peer Scaling

Methodology

Same distributed bench tool as the throughput test, with varying peer counts: 20, 40, 60, 80, 100 peers (10--50 active pairs). Pre-generated pair files ensure cross-VM placement. 10--20 runs per data point.

Peer scaling: 8 vCPU at 10G offered

Results (8 vCPU, 10G offered)

PeersHD (Mbps)HD LossTS (Mbps)TS LossHD/TS
208,3710.1%4,49544%1.9x
408,0060.5%3,53857%2.3x
606,8800.7%3,14663%2.2x
807,8270.4%2,90566%2.7x
1007,6650.5%2,77568%2.8x

TS loses 38% throughput and gains 24 percentage points of loss going from 20 to 100 peers. HD stays flat. The ratio amplifies from 1.9x to 2.8x.

Why

TS creates 2 goroutines per peer. At 100 peers = 200 goroutines competing for CPU, plus scheduling overhead and cache pressure from goroutine stack switches.

HD's sharded hash table is O(1) per peer regardless of count. Each worker owns a disjoint peer set -- adding peers doesn't increase contention.

Worker Optimization

HD's --workers flag controls io_uring thread count. Optimal values by config:

16 vCPU

Rate4w6w8w10w12w
10G8,4258,4628,6248,7068,660
15G11,11810,62012,08812,31112,106
20G11,48711,90814,35414,67314,950
25G11,89012,10616,54516,08315,836

8 vCPU

Rate2w3w4w6w
5G4,1824,3524,3534,352
10G5,1217,0218,3718,272
15G5,1527,31511,0879,966

Rule of thumb: workers = vCPUs / 2. Higher peer counts make more workers viable by improving hash distribution.