Benchmarks

Methodology

Total data: 4,903 benchmark runs (3,703 relay throughput + 480 latency + 720 tunnel quality).

Traffic generator: custom distributed bench tool (derp_scale_test), 20 DERP peers across 4 client VMs, 10 sender/receiver pairs, ~1400-byte messages at WireGuard MTU, token-bucket pacing
Duration: 15 seconds per run (3s warmup, 12s measured)
Runs: 20 per data point, 95% CIs (Welch's t)
Latency: derp_test_client ping/echo, 5,000 samples per run, 2.16M total samples
Tunnel quality: iperf3 UDP + TCP + ICMP through WireGuard/Tailscale, 20 runs per point

Component	Version
Hyper-DERP	kTLS (TLS 1.3 AES-GCM), io_uring DEFER_TASKRUN
Go derper	v1.96.4, go1.26.1, release build
Kernel	6.12.73+deb13-cloud-amd64

Role	Machine Type	Count	NIC BW
Relay	c4-highcpu-16	1	22 Gbps
Client	c4-highcpu-8	4	22 Gbps each

GCP europe-west4-a. NIC bandwidth verified at 22 Gbps on all paths.

Config	HD Peak (Mbps)	TS Ceiling (Mbps)	HD/TS
2 vCPU (1w)	3,730	1,870	10.8x
4 vCPU (2w)	6,091	2,798	3.5x
8 vCPU (4w)	12,316	4,670	2.7x
16 vCPU (8w)	16,545	7,834	2.1x

HD's advantage grows as resources shrink. At 2 vCPU, TS collapses (92% loss at 5G offered) while HD delivers 3.5 Gbps.

HD delivers the same throughput on half the vCPUs:

TS deployment	TS throughput	HD equivalent	HD throughput	Savings
TS on 16 vCPU	7,834 Mbps	HD on 8 vCPU	8,371 Mbps	2x
TS on 8 vCPU	4,670 Mbps	HD on 4 vCPU	5,457 Mbps	2x
TS on 4 vCPU	2,798 Mbps	HD on 2 vCPU	3,536 Mbps	2x

Full methodology, raw data, and tooling are in the benchmark repository.