Zero Trust in GKE: Envoy, OPA, and Workload Identity

I have been thinking a lot about internal traffic lately. We spend weeks locking down our perimeters and adding layers of WAFs. But what happens when a rogue pod inside the cluster decides to reach out to your billing service?

Usually it just works. And that terrifies me.

We blindly trust local network segments way too much. I wanted to fix this without deploying a massive heavyweight service mesh. I just needed a simple, bulletproof way to authenticate, authorize, and log every single internal request.

Let’s talk about building a full AAA (Authentication, Authorization, Accounting) stack using GKE’s Workload Identity Federation, Envoy, and OPA.

Free Authentication? Yes Please.

Authentication inside Kubernetes used to be a massive headache. You had to juggle custom certificates or distribute shared secrets. It was a nightmare to rotate anything.

GKE Workload Identity Federation changes the game entirely. It binds Kubernetes ServiceAccounts to GCP IAM identities. But we can abuse this mechanism for our local services too. When Workload Identity is enabled, GKE automatically mounts a short-lived OIDC-compliant JWT into your pods.

This token is cryptographically signed by Google. You can literally just grab it from the filesystem and attach it to your outbound requests.

cat /var/run/secrets/kubernetes.io/serviceaccount/token
# eyJhbGciOiJSUzI1NiIsImtpZCI6Ij...

Instead of sending naked HTTP calls, our client service just reads this file and injects an Authorization: Bearer <TOKEN> header. Boom. We have ironclad, automatically rotating authentication.

Enter The Enforcer

Authentication alone is useless if the receiving end doesn’t validate it. This is where Envoy steps in.

I drop Envoy in front of my sensitive custom services as a lightweight sidecar. Envoy has a built-in jwt_authn filter. We configure it to pull the public keys from the GKE cluster’s JWKS endpoint.

Envoy intercepts the incoming request, parses the JWT, and verifies the signature. If the token is invalid or expired, Envoy drops the connection instantly. The backend service never even sees the garbage traffic.

http_filters:
  - name: envoy.filters.http.jwt_authn
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.http.jwt_authn.v3.JwtAuthentication
      providers:
        gke-workload:
          issuer: "https://container.googleapis.com/v1/projects/my-project/locations/europe-west1/clusters/my-cluster"
          remote_jwks:
            http_uri:
              uri: "https://container.googleapis.com/v1/projects/my-project/locations/europe-west1/clusters/my-cluster/jwks"
              cluster: jwks_cluster
              timeout: 1s

Writing the Rules

Now we know who is calling. But should they be calling? That is Authorization.

Envoy is great at routing, but it is terrible at complex policy logic. So we hand the verified request metadata over to Open Policy Agent (OPA) using Envoy’s ext_authz filter.

OPA runs as another container right next to Envoy. It evaluates the request against our custom Rego policies. We can extract the claims from the validated JWT and make extremely fine-grained decisions.

Maybe the inventory-service is allowed to GET the /stock endpoint, but absolutely forbidden from using POST. Or maybe we want to allow multiple monitoring agents like Prometheus and Datadog to simultaneously scrape our /metrics endpoint. We just write the rules.

package envoy.authz
import rego.v1
import input.attributes.request.http as http_request

default allow := false

calling_service_account := sa if {
    auth_header := http_request.headers.authorization
    startswith(auth_header, "Bearer ")
    token := substring(auth_header, 7, -1)
    
    [_, payload, _] := io.jwt.decode(token)
    sa := payload["kubernetes.io"]["serviceaccount"]["name"]
}

allow if {
    http_request.method == "GET"
    http_request.path == "/stock"
    calling_service_account == "inventory-service"
}

allow if {
    http_request.method == "GET"
    http_request.path == "/metrics"
    calling_service_account in {"prometheus", "datadog-agent"}
}

If OPA returns allow = true, Envoy forwards the request to the local backend. Otherwise, it slams the door with a 403 Forbidden.

The Missing Layer: Accounting and Network

So we have Authentication and Authorization. What about Accounting?

Envoy is a logging beast. Every single request, its headers, and the OPA decision results are dumped into stdout. You just scrape these access logs with Fluent-Bit or Promtail. You instantly get a complete, searchable audit trail of every internal transaction.

But there is one final piece to this puzzle. What if an attacker manages to bypass Envoy entirely and hits the backend pod IP directly?

This is exactly why you still need standard Kubernetes NetworkPolicies. NetworkPolicies act as our fundamental L3/L4 isolation boundary.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-and-allow-envoy
spec:
  podSelector:
    matchLabels:
      app: sensitive-backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: authorized-client
    ports:
    - protocol: TCP
      port: 8080

We configure the NetworkPolicy to deny all inbound traffic by default. We only allow traffic on the specific Envoy listener ports from approved namespaces. The backend application port is physically unreachable from the rest of the cluster.

Automating the Egress

Wait, are we really going to force our developers to manually read files and inject headers into every single HTTP client they write? Absolutely not.

Since Envoy is already acting as our sidecar proxy, we can intercept all outbound traffic too. We just spin up an egress listener in Envoy and drop in a native Lua filter.

The Lua script intercepts the outbound request, reads the dynamically rotated token directly from /var/run/secrets/kubernetes.io/serviceaccount/token, and stamps the Authorization header on the fly. The application just makes a dumb, unauthenticated HTTP call to localhost, and Envoy magically upgrades it into a cryptographically verified Workload Identity request before it ever hits the wire.

The Full Flow

When you put all of these pieces together, the entire cross-cluster request lifecycle looks like this:

sequenceDiagram
    box rgba(200, 200, 200, 0.1) Caller Pod Boundary
    participant ClientApp as Client Application
    participant ClientEnvoy as Envoy Sidecar
    end
    
    box rgba(200, 200, 200, 0.1) Backend Pod Boundary
    participant ServerEnvoy as Envoy Sidecar
    participant OPA as Open Policy Agent
    participant ServerApp as Target Backend
    end

    ClientApp->>ClientEnvoy: HTTP GET /stock (Unauthenticated)
    Note over ClientEnvoy: Lua Filter intercepts request
    ClientEnvoy->>ClientEnvoy: Reads K8s SA Token from disk
    ClientEnvoy->>ClientEnvoy: Injects 'Authorization: Bearer <JWT>'
    ClientEnvoy->>ServerEnvoy: HTTP GET /stock (with JWT)
    
    Note over ServerEnvoy: jwt_authn filter validates signature
    ServerEnvoy->>ServerEnvoy: Cryptographic verification via GKE JWKS
    
    ServerEnvoy->>OPA: gRPC CheckRequest (ext_authz)
    Note over OPA: Evaluates Rego Policy
    OPA-->>ServerEnvoy: gRPC Response (Allow)
    
    ServerEnvoy->>ServerApp: Proxies HTTP GET /stock
    ServerApp-->>ServerEnvoy: HTTP 200 OK
    ServerEnvoy-->>ClientEnvoy: HTTP 200 OK
    ClientEnvoy-->>ClientApp: HTTP 200 OK

Wrapping it all together

NetworkPolicies lock down the physical routing. Workload Identity provides seamless cryptographic Authentication. Envoy intercepts and validates. OPA enforces complex Authorization rules. And Envoy’s logging gives us complete Accounting.

Every single aspect of AAA is covered. And we built it all using standard, open-source components without forcing a massive, complicated mesh down the throat of our engineering teams.

Building secure systems doesn’t always require buying into the latest hype. Sometimes the best solutions are just snapping together the right basic building blocks in a clever way.

Appendix

For those of you who want to replicate this setup in your own clusters, here are the complete configuration files. I stripped out some of the boilerplate tuning parameters for readability, but the core AAA logic is fully intact.

Full Envoy Configuration (envoy.yaml)

static_resources:
  listeners:
  - name: egress_listener
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 9090
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: egress_http
          route_config:
            name: egress_route
            virtual_hosts:
            - name: egress_host
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: external_upstream
          http_filters:
          - name: envoy.filters.http.lua
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua
              inline_code: |
                function envoy_on_request(request_handle)
                  -- In a heavy production environment you would cache this in memory, 
                  -- but for demonstration purposes we read the token dynamically.
                  local file = io.open("/var/run/secrets/kubernetes.io/serviceaccount/token", "r")
                  if file then
                    local token = file:read("*all")
                    file:close()
                    -- Strip trailing newlines from the token if they exist
                    token = string.gsub(token, "\n", "")
                    request_handle:headers():add("Authorization", "Bearer " .. token)
                  end
                end
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

  - name: ingress_listener
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 8080
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: local_backend
          http_filters:
          - name: envoy.filters.http.jwt_authn
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.jwt_authn.v3.JwtAuthentication
              providers:
                gke-workload:
                  issuer: "https://container.googleapis.com/v1/projects/my-project/locations/europe-west1/clusters/my-cluster"
                  remote_jwks:
                    http_uri:
                      uri: "https://container.googleapis.com/v1/projects/my-project/locations/europe-west1/clusters/my-cluster/jwks"
                      cluster: jwks_cluster
                      timeout: 1s
              rules:
              - match:
                  prefix: "/"
                requires:
                  provider_name: gke-workload
          - name: envoy.ext_authz
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.ext_authz.v3.ExtAuthz
              transport_api_version: V3
              with_request_body:
                max_request_bytes: 8192
                allow_partial_message: true
              failure_mode_allow: false
              grpc_service:
                google_grpc:
                  target_uri: 127.0.0.1:9191
                  stat_prefix: ext_authz
                timeout: 0.5s
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

  clusters:
  - name: external_upstream
    connect_timeout: 0.25s
    type: LOGICAL_DNS
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: external_upstream
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: target-service.default.svc.cluster.local
                port_value: 8080

  - name: local_backend
    connect_timeout: 0.25s
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: local_backend
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 8081

  - name: jwks_cluster
    connect_timeout: 1s
    type: LOGICAL_DNS
    dns_lookup_family: V4_ONLY
    load_assignment:
      cluster_name: jwks_cluster
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: container.googleapis.com
                port_value: 443
    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext

Full Kubernetes Deployment (deployment.yaml)

This is how you wire everything together in a single Pod. The backend application only listens on localhost, forcing all external cluster traffic to pass through the Envoy sidecar on port 8080.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sensitive-backend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sensitive-backend
  template:
    metadata:
      labels:
        app: sensitive-backend
    spec:
      serviceAccountName: sensitive-backend-sa
      containers:
      # 1. The actual backend application (only listens on 127.0.0.1:8081)
      - name: backend
        image: my-company/sensitive-backend:v1.0.0
        
      # 2. The Envoy Proxy Sidecar (listens on 0.0.0.0:8080)
      - name: envoy
        image: envoyproxy/envoy:v1.30.0
        ports:
        - containerPort: 8080
        volumeMounts:
        - name: envoy-config
          mountPath: /etc/envoy
          readOnly: true
          
      # 3. The Open Policy Agent Sidecar (listens on 127.0.0.1:9191)
      - name: opa
        image: openpolicyagent/opa:latest-envoy-static
        args:
        - "run"
        - "--server"
        - "--addr=localhost:8181"
        - "--diagnostic-addr=0.0.0.0:8282"
        - "--set=plugins.envoy_ext_authz_grpc.addr=:9191"
        - "--set=plugins.envoy_ext_authz_grpc.path=envoy/authz/allow"
        - "--set=decision_logs.console=true"
        - "--set=status.console=true"
        - "--ignore=.*"
        - "/etc/opa/policy.rego"
        livenessProbe:
          httpGet:
            path: /health?plugins
            scheme: HTTP
            port: 8282
          initialDelaySeconds: 5
          periodSeconds: 5
        readinessProbe:
          httpGet:
            path: /health?plugins
            scheme: HTTP
            port: 8282
          initialDelaySeconds: 1
          periodSeconds: 3
        volumeMounts:
        - name: opa-policy
          mountPath: /etc/opa
          readOnly: true

      volumes:
      - name: envoy-config
        configMap:
          name: envoy-configmap
      - name: opa-policy
        configMap:
          name: opa-configmap

Full Open Policy Agent Configuration (policy.rego)

package envoy.authz

import rego.v1
import input.attributes.request.http as http_request

# Default deny everything. Anything not explicitly allowed is dropped.
default allow := false

# Helper to extract the calling service account from the validated JWT
calling_service_account := sa if {
    auth_header := http_request.headers.authorization
    startswith(auth_header, "Bearer ")
    token := substring(auth_header, 7, -1)
    
    [_, payload, _] := io.jwt.decode(token)
    sa := payload["kubernetes.io"]["serviceaccount"]["name"]
}

# The inventory-service is allowed to GET the /stock endpoint
allow if {
    http_request.method == "GET"
    http_request.path == "/stock"
    calling_service_account == "inventory-service"
}

# The finance-service is allowed to POST to the /billing endpoint
allow if {
    http_request.method == "POST"
    http_request.path == "/billing"
    calling_service_account == "finance-service"
}

# The admin-service has blanket access to any GET endpoint
allow if {
    http_request.method == "GET"
    calling_service_account == "admin-service"
}

# The metrics endpoint is accessible by multiple specific monitoring accounts
allow if {
    http_request.method == "GET"
    http_request.path == "/metrics"
    calling_service_account in {"prometheus", "datadog-agent"}
}

# If inventory-service tries to POST to /billing, it fails all `allow if` 
# blocks and falls back to the `default allow := false` at the top!