9 minutes
Zero Trust in GKE: Envoy, OPA, and Workload Identity
I have been thinking a lot about internal traffic lately. We spend weeks locking down our perimeters and adding layers of WAFs. But what happens when a rogue pod inside the cluster decides to reach out to your billing service?
Usually it just works. And that terrifies me.
We blindly trust local network segments way too much. I wanted to fix this without deploying a massive heavyweight service mesh. I just needed a simple, bulletproof way to authenticate, authorize, and log every single internal request.
Let’s talk about building a full AAA (Authentication, Authorization, Accounting) stack using GKE’s Workload Identity Federation, Envoy, and OPA.
Free Authentication? Yes Please.
Authentication inside Kubernetes used to be a massive headache. You had to juggle custom certificates or distribute shared secrets. It was a nightmare to rotate anything.
GKE Workload Identity Federation changes the game entirely. It binds Kubernetes ServiceAccounts to GCP IAM identities. But we can abuse this mechanism for our local services too. When Workload Identity is enabled, GKE automatically mounts a short-lived OIDC-compliant JWT into your pods.
This token is cryptographically signed by Google. You can literally just grab it from the filesystem and attach it to your outbound requests.
cat /var/run/secrets/kubernetes.io/serviceaccount/token
# eyJhbGciOiJSUzI1NiIsImtpZCI6Ij...
Instead of sending naked HTTP calls, our client service just reads this file and injects an Authorization: Bearer <TOKEN> header. Boom. We have ironclad, automatically rotating authentication.
Enter The Enforcer
Authentication alone is useless if the receiving end doesn’t validate it. This is where Envoy steps in.
I drop Envoy in front of my sensitive custom services as a lightweight sidecar. Envoy has a built-in jwt_authn filter. We configure it to pull the public keys from the GKE cluster’s JWKS endpoint.
Envoy intercepts the incoming request, parses the JWT, and verifies the signature. If the token is invalid or expired, Envoy drops the connection instantly. The backend service never even sees the garbage traffic.
http_filters:
- name: envoy.filters.http.jwt_authn
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.jwt_authn.v3.JwtAuthentication
providers:
gke-workload:
issuer: "https://container.googleapis.com/v1/projects/my-project/locations/europe-west1/clusters/my-cluster"
remote_jwks:
http_uri:
uri: "https://container.googleapis.com/v1/projects/my-project/locations/europe-west1/clusters/my-cluster/jwks"
cluster: jwks_cluster
timeout: 1s
Writing the Rules
Now we know who is calling. But should they be calling? That is Authorization.
Envoy is great at routing, but it is terrible at complex policy logic. So we hand the verified request metadata over to Open Policy Agent (OPA) using Envoy’s ext_authz filter.
OPA runs as another container right next to Envoy. It evaluates the request against our custom Rego policies. We can extract the claims from the validated JWT and make extremely fine-grained decisions.
Maybe the inventory-service is allowed to GET the /stock endpoint, but absolutely forbidden from using POST. Or maybe we want to allow multiple monitoring agents like Prometheus and Datadog to simultaneously scrape our /metrics endpoint. We just write the rules.
package envoy.authz
import rego.v1
import input.attributes.request.http as http_request
default allow := false
calling_service_account := sa if {
auth_header := http_request.headers.authorization
startswith(auth_header, "Bearer ")
token := substring(auth_header, 7, -1)
[_, payload, _] := io.jwt.decode(token)
sa := payload["kubernetes.io"]["serviceaccount"]["name"]
}
allow if {
http_request.method == "GET"
http_request.path == "/stock"
calling_service_account == "inventory-service"
}
allow if {
http_request.method == "GET"
http_request.path == "/metrics"
calling_service_account in {"prometheus", "datadog-agent"}
}
If OPA returns allow = true, Envoy forwards the request to the local backend. Otherwise, it slams the door with a 403 Forbidden.
The Missing Layer: Accounting and Network
So we have Authentication and Authorization. What about Accounting?
Envoy is a logging beast. Every single request, its headers, and the OPA decision results are dumped into stdout. You just scrape these access logs with Fluent-Bit or Promtail. You instantly get a complete, searchable audit trail of every internal transaction.
But there is one final piece to this puzzle. What if an attacker manages to bypass Envoy entirely and hits the backend pod IP directly?
This is exactly why you still need standard Kubernetes NetworkPolicies. NetworkPolicies act as our fundamental L3/L4 isolation boundary.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-and-allow-envoy
spec:
podSelector:
matchLabels:
app: sensitive-backend
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: authorized-client
ports:
- protocol: TCP
port: 8080
We configure the NetworkPolicy to deny all inbound traffic by default. We only allow traffic on the specific Envoy listener ports from approved namespaces. The backend application port is physically unreachable from the rest of the cluster.
Automating the Egress
Wait, are we really going to force our developers to manually read files and inject headers into every single HTTP client they write? Absolutely not.
Since Envoy is already acting as our sidecar proxy, we can intercept all outbound traffic too. We just spin up an egress listener in Envoy and drop in a native Lua filter.
The Lua script intercepts the outbound request, reads the dynamically rotated token directly from /var/run/secrets/kubernetes.io/serviceaccount/token, and stamps the Authorization header on the fly. The application just makes a dumb, unauthenticated HTTP call to localhost, and Envoy magically upgrades it into a cryptographically verified Workload Identity request before it ever hits the wire.
The Full Flow
When you put all of these pieces together, the entire cross-cluster request lifecycle looks like this:
sequenceDiagram
box rgba(200, 200, 200, 0.1) Caller Pod Boundary
participant ClientApp as Client Application
participant ClientEnvoy as Envoy Sidecar
end
box rgba(200, 200, 200, 0.1) Backend Pod Boundary
participant ServerEnvoy as Envoy Sidecar
participant OPA as Open Policy Agent
participant ServerApp as Target Backend
end
ClientApp->>ClientEnvoy: HTTP GET /stock (Unauthenticated)
Note over ClientEnvoy: Lua Filter intercepts request
ClientEnvoy->>ClientEnvoy: Reads K8s SA Token from disk
ClientEnvoy->>ClientEnvoy: Injects 'Authorization: Bearer <JWT>'
ClientEnvoy->>ServerEnvoy: HTTP GET /stock (with JWT)
Note over ServerEnvoy: jwt_authn filter validates signature
ServerEnvoy->>ServerEnvoy: Cryptographic verification via GKE JWKS
ServerEnvoy->>OPA: gRPC CheckRequest (ext_authz)
Note over OPA: Evaluates Rego Policy
OPA-->>ServerEnvoy: gRPC Response (Allow)
ServerEnvoy->>ServerApp: Proxies HTTP GET /stock
ServerApp-->>ServerEnvoy: HTTP 200 OK
ServerEnvoy-->>ClientEnvoy: HTTP 200 OK
ClientEnvoy-->>ClientApp: HTTP 200 OK
Wrapping it all together
NetworkPolicies lock down the physical routing. Workload Identity provides seamless cryptographic Authentication. Envoy intercepts and validates. OPA enforces complex Authorization rules. And Envoy’s logging gives us complete Accounting.
Every single aspect of AAA is covered. And we built it all using standard, open-source components without forcing a massive, complicated mesh down the throat of our engineering teams.
Building secure systems doesn’t always require buying into the latest hype. Sometimes the best solutions are just snapping together the right basic building blocks in a clever way.
Appendix
For those of you who want to replicate this setup in your own clusters, here are the complete configuration files. I stripped out some of the boilerplate tuning parameters for readability, but the core AAA logic is fully intact.
Full Envoy Configuration (envoy.yaml)
static_resources:
listeners:
- name: egress_listener
address:
socket_address:
address: 0.0.0.0
port_value: 9090
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: egress_http
route_config:
name: egress_route
virtual_hosts:
- name: egress_host
domains: ["*"]
routes:
- match:
prefix: "/"
route:
cluster: external_upstream
http_filters:
- name: envoy.filters.http.lua
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua
inline_code: |
function envoy_on_request(request_handle)
-- In a heavy production environment you would cache this in memory,
-- but for demonstration purposes we read the token dynamically.
local file = io.open("/var/run/secrets/kubernetes.io/serviceaccount/token", "r")
if file then
local token = file:read("*all")
file:close()
-- Strip trailing newlines from the token if they exist
token = string.gsub(token, "\n", "")
request_handle:headers():add("Authorization", "Bearer " .. token)
end
end
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
- name: ingress_listener
address:
socket_address:
address: 0.0.0.0
port_value: 8080
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match:
prefix: "/"
route:
cluster: local_backend
http_filters:
- name: envoy.filters.http.jwt_authn
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.jwt_authn.v3.JwtAuthentication
providers:
gke-workload:
issuer: "https://container.googleapis.com/v1/projects/my-project/locations/europe-west1/clusters/my-cluster"
remote_jwks:
http_uri:
uri: "https://container.googleapis.com/v1/projects/my-project/locations/europe-west1/clusters/my-cluster/jwks"
cluster: jwks_cluster
timeout: 1s
rules:
- match:
prefix: "/"
requires:
provider_name: gke-workload
- name: envoy.ext_authz
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.ext_authz.v3.ExtAuthz
transport_api_version: V3
with_request_body:
max_request_bytes: 8192
allow_partial_message: true
failure_mode_allow: false
grpc_service:
google_grpc:
target_uri: 127.0.0.1:9191
stat_prefix: ext_authz
timeout: 0.5s
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: external_upstream
connect_timeout: 0.25s
type: LOGICAL_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: external_upstream
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: target-service.default.svc.cluster.local
port_value: 8080
- name: local_backend
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: local_backend
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 8081
- name: jwks_cluster
connect_timeout: 1s
type: LOGICAL_DNS
dns_lookup_family: V4_ONLY
load_assignment:
cluster_name: jwks_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: container.googleapis.com
port_value: 443
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
Full Kubernetes Deployment (deployment.yaml)
This is how you wire everything together in a single Pod. The backend application only listens on localhost, forcing all external cluster traffic to pass through the Envoy sidecar on port 8080.
apiVersion: apps/v1
kind: Deployment
metadata:
name: sensitive-backend
spec:
replicas: 1
selector:
matchLabels:
app: sensitive-backend
template:
metadata:
labels:
app: sensitive-backend
spec:
serviceAccountName: sensitive-backend-sa
containers:
# 1. The actual backend application (only listens on 127.0.0.1:8081)
- name: backend
image: my-company/sensitive-backend:v1.0.0
# 2. The Envoy Proxy Sidecar (listens on 0.0.0.0:8080)
- name: envoy
image: envoyproxy/envoy:v1.30.0
ports:
- containerPort: 8080
volumeMounts:
- name: envoy-config
mountPath: /etc/envoy
readOnly: true
# 3. The Open Policy Agent Sidecar (listens on 127.0.0.1:9191)
- name: opa
image: openpolicyagent/opa:latest-envoy-static
args:
- "run"
- "--server"
- "--addr=localhost:8181"
- "--diagnostic-addr=0.0.0.0:8282"
- "--set=plugins.envoy_ext_authz_grpc.addr=:9191"
- "--set=plugins.envoy_ext_authz_grpc.path=envoy/authz/allow"
- "--set=decision_logs.console=true"
- "--set=status.console=true"
- "--ignore=.*"
- "/etc/opa/policy.rego"
livenessProbe:
httpGet:
path: /health?plugins
scheme: HTTP
port: 8282
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
httpGet:
path: /health?plugins
scheme: HTTP
port: 8282
initialDelaySeconds: 1
periodSeconds: 3
volumeMounts:
- name: opa-policy
mountPath: /etc/opa
readOnly: true
volumes:
- name: envoy-config
configMap:
name: envoy-configmap
- name: opa-policy
configMap:
name: opa-configmap
Full Open Policy Agent Configuration (policy.rego)
package envoy.authz
import rego.v1
import input.attributes.request.http as http_request
# Default deny everything. Anything not explicitly allowed is dropped.
default allow := false
# Helper to extract the calling service account from the validated JWT
calling_service_account := sa if {
auth_header := http_request.headers.authorization
startswith(auth_header, "Bearer ")
token := substring(auth_header, 7, -1)
[_, payload, _] := io.jwt.decode(token)
sa := payload["kubernetes.io"]["serviceaccount"]["name"]
}
# The inventory-service is allowed to GET the /stock endpoint
allow if {
http_request.method == "GET"
http_request.path == "/stock"
calling_service_account == "inventory-service"
}
# The finance-service is allowed to POST to the /billing endpoint
allow if {
http_request.method == "POST"
http_request.path == "/billing"
calling_service_account == "finance-service"
}
# The admin-service has blanket access to any GET endpoint
allow if {
http_request.method == "GET"
calling_service_account == "admin-service"
}
# The metrics endpoint is accessible by multiple specific monitoring accounts
allow if {
http_request.method == "GET"
http_request.path == "/metrics"
calling_service_account in {"prometheus", "datadog-agent"}
}
# If inventory-service tries to POST to /billing, it fails all `allow if`
# blocks and falls back to the `default allow := false` at the top!
Related Content:
- Oubliette: Clarifying the Protection Bounds and UDP Reflection May 2
- Oubliette Progress: End-to-End Lab Results May 1
- Exploiting TCP Handshake Quirks for IP Verification Apr 18
- Oubliette: A Linerate eBPF/XDP DDoS Scrubber Apr 18
- Building an anycast authoritative DNS infrastructure for funsies Jul 27