Architecture

Overview

Hyper-DERP is structured as three layers:

Accept layer -- listens for incoming connections, performs TLS handshake (or hands off to kTLS), and assigns each connection to a worker shard.
Data plane -- per-worker io_uring event loops that handle all packet forwarding. Each worker owns a disjoint set of connections; no locking required on the hot path.
Control plane -- handles DERP protocol control messages (mesh key exchange, peer discovery, keep-alive) on a separate low-priority ring.

Shard-per-Core Model

Each worker thread owns:

One io_uring instance
A set of client connections (pinned by accept-round-robin or SO_INCOMING_CPU)
A provided buffer ring for zero-copy receives
SPSC ring endpoints for cross-worker forwarding

No locks on the forwarding path. Cross-worker traffic (client A on worker 0 sending to client B on worker 1) goes through a lock-free SPSC ring per worker pair.

Provided Buffer Rings

io_uring provided buffer rings let the kernel pick a buffer from a pre-registered pool at completion time, avoiding per-recv buffer allocation. Hyper-DERP sizes these to match the expected DERP frame size and recycles them after forwarding.

kTLS Offload

When the kernel supports it, Hyper-DERP hands the TLS session keys to the kernel via setsockopt(SOL_TLS). This lets the kernel handle encryption/decryption in sendfile-style zero-copy paths, keeping crypto off the user-space CPU budget.

Backpressure

If a destination client's send buffer is full, Hyper-DERP applies backpressure by:

Pausing reads on the source connection (removing it from the io_uring poll set)
Queuing a bounded number of frames in the SPSC ring
Dropping frames beyond the queue limit with a counter bump (visible in metrics)

This prevents a slow client from consuming unbounded memory.

Overview​

Shard-per-Core Model​

Provided Buffer Rings​

kTLS Offload​

Backpressure​

Overview

Shard-per-Core Model

Provided Buffer Rings

kTLS Offload

Backpressure