How We Scaled Private Link Routing Across AWS, Azure, and GCP Using Envoy and Wasm

If you are managing cloud infrastructure at scale, you eventually hit the wall with Private Links (AWS PrivateLink, Azure Private Link, and GCP Private Service Connect).

The standard approach—mapping one cloud provider endpoint service to one internal dedicated Network Load Balancer (NLB) for every customer—breaks down quickly. It incurs massive infrastructure costs and slams into hard cloud limits. Azure, for example, restricts an AKS cluster to a maximum of just 8 Private Link Services.

To bypass these limits, we built a shared L4 routing tier. We now route multiple private links through a single shared NLB and an Envoy proxy, using a custom C++ WebAssembly (Wasm) plugin to dynamically route connections based on PROXY Protocol v2 (PPv2) headers.

Here is how we built it, the Envoy limitations we had to patch, and the operational gotchas we hit along the way.

The Routing Mechanism: PROXY Protocol v2 (PPv2)

With Private Links, you cannot use IP filtering because all connections appear to originate from the internal IP of the Private Link Service.

Instead, we rely on PPv2. When enabled on the cloud provider's NLB, it prepends a binary Type-Length-Value (TLV) header to the raw byte stream before the TLS handshake. This blob contains the customer's unique VPC Endpoint ID (or Link ID).

Each cloud injects this ID differently:

AWS: Direct string starting after the first subtype byte (0xEA).
Azure: 5-byte payload. The LinkID is a 32-bit little-endian integer (0xEE, Subtype 0x01).
GCP: 8-byte payload. The LinkID is a 64-bit big-endian integer.

Why Envoy and Wasm? (And Why We Ditched Golang)

To route L4 traffic dynamically based on the PPv2 header, we needed to customize Envoy.

Native C++ filters require recompiling Envoy on every change. Native Golang network filters require manual, expensive buffer copying between upstream and downstream. Wasm was the obvious choice for a mature, dynamically loadable plugin model.

Initially, we wrote the Wasm plugin in Golang. However, under high connection churn, the Golang Wasm implementation suffered from severe memory leaks. We quickly scrapped it, rewrote the parsing logic, and switched to a C++ Wasm plugin.

Upstreaming a Patch to Envoy

Building the plugin exposed a critical limitation in Envoy's Wasm API: it lacked the ability to dynamically set upstream addresses for L4 traffic. The existing setProperty method prefixed all keys with wasm., preventing us from setting the envoy.tcp_proxy.cluster filter state to select the upstream destination.

To unblock ourselves, we contributed a patch to the upstream Envoy project, adding a new set_envoy_filter_state proxy-wasm foreign function. This allowed our plugin to extract the Link ID and hand it off to Envoy's TCPProxy filter.

(Note: While our initial contribution, set_envoy_filter_state, got us into production, the community later adopted a more generalized, native function, set_filter_state, which we migrated to for long-term support).

Later we found that a security patch in Envoy made it impossible to get original proxy protocol TLVs in the WASM plugin. However we found a workaround which is documented in the github issue raised on the repo.

The Envoy Pipeline and C++ Wasm Pseudo-code

Our routing pipeline executes in three phases:

Proxy Protocol Listener Filter: Parses the raw PPv2 header and stores it in Envoy's filter state (com.singlestore.originalPrivateLinkTLV).
Custom C++ Wasm Filter: Reads the raw bytes, parses the endianness based on the cloud provider, extracts the Link ID, and injects it back into the filter state.
TCPProxy Filter: Uses the dynamically set filter state to route the connection to <privateLinkId>.<namespace>.svc.cluster.local.

Here is a stripped-down look at the C++ Wasm logic:

1FilterStatus PluginContext::onNewConnection() {2    // 1. Fetch the raw PPv2 TLV header populated by the listener filter3    std::string tlv_header;4    if (!getValue<std::string>({"filter_state", "com.singlestore.originalPrivateLinkTLV"}, &tlv_header)) {5        return FilterStatus::StopIteration;6    }7
8    string provider = getCloudProvider();9    string link_id = "";10
11    // 2. Parse the Link ID based on cloud-specific byte formats12    if (provider == "AWS") {13        // AWS: Strip the subtype byte, extract the remaining string14        link_id = extractAwsEndpointId(tlv_header);15    } else if (provider == "AZURE") {16        // Azure: Parse the 32-bit little-endian integer from the 5-byte payload17        link_id = parseLittleEndianUint32(tlv_header);18    } else if (provider == "GCP") {19        // GCP: Parse the 64-bit big-endian integer from the 8-byte payload20        link_id = parseBigEndianUint64(tlv_header);21    }22
23    // 3. Pass the Link ID to Envoy's TCPProxy filter state using the native extension24    setFilterStateStringValue("privateLinkID", link_id);25
26    return FilterStatus::Continue;27}

Control Plane Simplicity

Pushing the routing logic down to Envoy radically simplified our control plane.

We no longer provision cloud-native load balancers when a user requests a Private Link. The customer creates a VPC Endpoint against our shared Endpoint Service. They hand us the Endpoint ID. Our control plane simply spins up a headless Kubernetes Service named <privateLinkId>.<namespace>.svc.cluster.local pointing to the target workspace pods.

To tear down a connection, we just delete the Kubernetes service.

Operational Trade-offs: The Graceful Shutdown Problem

While this architecture bypasses cloud limits and slashes infrastructure costs, it funnels all customer traffic through an intermediary proxy layer.

Because Envoy sits perfectly in the critical path at L4, an abrupt pod termination immediately severs active client connections. You cannot just casually redeploy the Gateway daemonset. We had to strictly orchestrate graceful connection termination across our infrastructure. Before an Envoy pod goes down, it stops accepting new connections and explicitly waits for existing long-lived database connections to drain safely, ensuring zero unexpected drops for the end user.

Conclusion

Building a shared L4 routing tier for private links fundamentally changed how we manage private link networking in the cloud. By moving away from a 1:1 infrastructure mapping and relying entirely on PROXY Protocol v2 and a custom Envoy Wasm plugin, we eliminated prohibitive cloud provider limits—like Azure's restrictive 8 Private Link Services per cluster.

We slashed our infrastructure costs, removed the need for dedicated cloud resources for every Private Link, and reduced our control plane's job to simply managing standard Kubernetes services. While this architecture introduces the operational complexity of managing an inline proxy and orchestrating graceful connection drains, the tradeoffs are undeniably worth it. If you are hitting scaling walls with AWS PrivateLink, Azure Private Link, or GCP Private Service Connect, pushing your routing logic down to the L4 byte stream with Envoy is a highly effective, scalable path forward.