Created
March 3, 2026 11:24
-
-
Save marcusramberg/319ad2b6b23559561d1367533c0c3134 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # k8s-e2e-verifier | |
| A Kubernetes operator that creates and manages e2e test jobs for Flagger canary deployments. | |
| ## Overview | |
| The k8s-e2e-verifier is a webhook server designed to work with [Flagger](https://flagger.app/) pre-rollout webhooks. It automates end-to-end testing before canary deployments are promoted, ensuring that only verified releases reach production. | |
| ### How it works | |
| 1. **Flagger triggers webhook**: When a canary deployment begins, Flagger calls the e2e-verifier webhook with details about the canary (name, namespace, checksum, phase) | |
| 2. **Check for existing job**: The verifier looks for an existing Kubernetes Job based on the canary name and checksum | |
| - If a job exists and **succeeded**: Returns HTTP 200 (test passed) | |
| - If a job exists and **failed**: Returns HTTP 500 (test failed) | |
| - If a job exists and is **running**: Returns HTTP 202 (test in progress) | |
| 3. **Create new job**: If no job exists, the verifier: | |
| - Fetches the E2ESpec CRD for the canary | |
| - Creates a Kubernetes Job based on the spec | |
| - Returns HTTP 202 (test started) | |
| 4. **Flagger retries**: Flagger will retry the webhook on non-2xx responses, allowing the verifier to check job status on subsequent calls | |
| ## Architecture | |
| ``` | |
| ┌─────────────┐ ┌──────────────────┐ ┌─────────────────┐ | |
| │ Flagger │────────>│ e2e-verifier │────────>│ E2ESpec CRD │ | |
| │ Canary │ webhook │ (this operator) │ read │ (test config) │ | |
| └─────────────┘ └──────────────────┘ └─────────────────┘ | |
| │ | |
| │ create/check | |
| v | |
| ┌──────────────┐ | |
| │ Kubernetes │ | |
| │ Job (e2e │ | |
| │ test runner) │ | |
| └──────────────┘ | |
| ``` | |
| ## Custom Resource Definition | |
| The operator uses a custom `E2ESpec` CRD to define the test job configuration: | |
| ```yaml | |
| apiVersion: remarkable.com/v1 | |
| kind: E2ESpec | |
| metadata: | |
| name: my-app # Must match your canary name | |
| namespace: default | |
| spec: | |
| # Container image for the e2e test | |
| image: gcr.io/my-project/my-app-e2e:latest | |
| # Optional: Override image entrypoint | |
| command: ["/bin/sh"] | |
| args: ["-c", "npm test"] | |
| # Environment variables (webhook metadata is automatically added) | |
| env: | |
| - name: API_ENDPOINT | |
| value: "http://my-app-canary:8080" | |
| # Mount secrets and configmaps | |
| secrets: | |
| - name: test-credentials | |
| mountPath: /var/secrets | |
| # Job configuration | |
| backoffLimit: 1 | |
| activeDeadlineSeconds: 600 | |
| ttlSecondsAfterFinished: 3600 | |
| # Resources | |
| resources: | |
| requests: | |
| memory: "256Mi" | |
| cpu: "100m" | |
| ``` | |
| See [examples/e2espec-example.yaml](examples/e2espec-example.yaml) for complete examples. | |
| ## Installation | |
| ### 1. Install the CRD | |
| ```bash | |
| kubectl apply -f cmd/k8s-e2e-verifier/e2espec-crd.yaml | |
| ``` | |
| ### 2. Deploy the operator | |
| ```bash | |
| kubectl apply -f cmd/k8s-e2e-verifier/examples/deployment.yaml | |
| ``` | |
| This creates: | |
| - ServiceAccount with necessary RBAC permissions | |
| - Deployment running the webhook server | |
| - Service exposing the webhook endpoint | |
| ### 3. Create an E2ESpec for your application | |
| ```bash | |
| kubectl apply -f cmd/k8s-e2e-verifier/examples/e2espec-example.yaml | |
| ``` | |
| ### 4. Configure Flagger Canary | |
| Add a pre-rollout webhook to your Flagger Canary: | |
| ```yaml | |
| apiVersion: flagger.app/v1beta1 | |
| kind: Canary | |
| metadata: | |
| name: my-app | |
| spec: | |
| # ... your canary config ... | |
| analysis: | |
| webhooks: | |
| - name: "e2e-verification" | |
| type: pre-rollout | |
| url: http://e2e-verifier.default/webhook | |
| timeout: 15m | |
| retries: 3 | |
| ``` | |
| See [examples/flagger-integration.yaml](examples/flagger-integration.yaml) for a complete example. | |
| ## Webhook Payload | |
| The operator expects the standard Flagger webhook payload: | |
| ```json | |
| { | |
| "name": "my-app", | |
| "namespace": "default", | |
| "phase": "Progressing", | |
| "checksum": "85d557f47b", | |
| "metadata": { | |
| "eventMessage": "Canary promotion starting", | |
| "eventType": "Normal", | |
| "timestamp": "1640000000000" | |
| } | |
| } | |
| ``` | |
| These fields are automatically injected into the test job as environment variables: | |
| - `CANARY_NAME` | |
| - `CANARY_NAMESPACE` | |
| - `CANARY_PHASE` | |
| - `CANARY_CHECKSUM` | |
| - `EVENT_MESSAGE` | |
| - `EVENT_TYPE` | |
| - `EVENT_TIMESTAMP` | |
| ## Response Codes | |
| The webhook returns different HTTP status codes based on the job status: | |
| | Status Code | Meaning | Flagger Action | | |
| |------------|---------|----------------| | |
| | 200 | Test succeeded | Continue rollout | | |
| | 202 | Test created or still running | Retry webhook later | | |
| | 404 | E2ESpec CRD not found | Halt rollout (error) | | |
| | 500 | Test failed | Halt and potentially rollback | | |
| ## Job Naming | |
| Jobs are named using the pattern: `e2e-{canary-name}-{checksum}` | |
| For example: `e2e-my-app-85d557f47b` | |
| The checksum ensures that each unique canary configuration gets its own test job. | |
| ## Configuration | |
| The operator supports the following environment variables: | |
| | Variable | Description | Default | | |
| |----------|-------------|---------| | |
| | `USE_IN_CLUSTER` | Use in-cluster Kubernetes config | `true` | | |
| | `PORT` | HTTP server port | `8080` | | |
| Standard server configuration is handled via the reMarkable `cloud/lib/v7/server` package. | |
| ## RBAC Requirements | |
| The operator requires the following permissions: | |
| **For the operator:** | |
| - `remarkable.com/e2especs`: get, list, watch | |
| - `batch/jobs`: get, list, watch, create, update, patch, delete | |
| - `batch/jobs/status`: get | |
| **For test jobs (optional):** | |
| Configure based on what your e2e tests need to access in the cluster. See the example `e2e-test-runner` ServiceAccount in [examples/deployment.yaml](examples/deployment.yaml). | |
| ## Examples | |
| ### Simple smoke test | |
| ```yaml | |
| apiVersion: remarkable.com/v1 | |
| kind: E2ESpec | |
| metadata: | |
| name: simple-test | |
| spec: | |
| image: alpine:latest | |
| command: ["/bin/sh"] | |
| args: ["-c", "echo 'Test passed' && exit 0"] | |
| ``` | |
| ### Comprehensive e2e test with secrets and resources | |
| ```yaml | |
| apiVersion: remarkable.com/v1 | |
| kind: E2ESpec | |
| metadata: | |
| name: my-app | |
| spec: | |
| image: gcr.io/my-project/my-app-e2e:v1.0.0 | |
| imagePullPolicy: Always | |
| env: | |
| - name: TEST_ENVIRONMENT | |
| value: "staging" | |
| - name: DB_PASSWORD | |
| valueFrom: | |
| secretKeyRef: | |
| name: db-credentials | |
| key: password | |
| envFrom: | |
| - secretRef: | |
| name: test-secrets | |
| - configMapRef: | |
| name: test-config | |
| secrets: | |
| - name: service-account-key | |
| mountPath: /var/secrets/google | |
| backoffLimit: 2 | |
| activeDeadlineSeconds: 600 | |
| ttlSecondsAfterFinished: 3600 | |
| serviceAccountName: e2e-test-runner | |
| resources: | |
| requests: | |
| memory: "512Mi" | |
| cpu: "250m" | |
| limits: | |
| memory: "1Gi" | |
| cpu: "1000m" | |
| ``` | |
| ## Development | |
| ### Building | |
| ```bash | |
| # From repository root | |
| go build -o bin/k8s-e2e-verifier ./cmd/k8s-e2e-verifier | |
| ``` | |
| ### Testing locally | |
| To test locally without deploying to a cluster: | |
| ```bash | |
| # Set environment variables | |
| export USE_IN_CLUSTER=false | |
| export PORT=8080 | |
| # Run the server | |
| ./bin/k8s-e2e-verifier | |
| ``` | |
| Then send a test webhook: | |
| ```bash | |
| curl -X POST http://localhost:8080/webhook \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "name": "my-app", | |
| "namespace": "default", | |
| "phase": "Progressing", | |
| "checksum": "test123", | |
| "metadata": { | |
| "eventMessage": "Test", | |
| "eventType": "Normal", | |
| "timestamp": "1640000000000" | |
| } | |
| }' | |
| ``` | |
| ## Troubleshooting | |
| ### Check operator logs | |
| ```bash | |
| kubectl logs -l app=e2e-verifier -n default --tail=100 -f | |
| ``` | |
| ### Check job status | |
| ```bash | |
| # List all e2e jobs | |
| kubectl get jobs -l remarkable.com/managed-by=e2e-verifier | |
| # Check specific job | |
| kubectl describe job e2e-my-app-85d557f47b | |
| # View job logs | |
| kubectl logs job/e2e-my-app-85d557f47b | |
| ``` | |
| ### Common issues | |
| **E2ESpec not found** | |
| - Ensure the E2ESpec name matches your canary name | |
| - Verify it's in the correct namespace | |
| - Check CRD is installed: `kubectl get crd e2especs.remarkable.com` | |
| **Job stuck in pending** | |
| - Check resource availability | |
| - Verify ServiceAccount exists | |
| - Check image pull secrets | |
| **Job fails immediately** | |
| - Check job logs: `kubectl logs job/e2e-{name}-{checksum}` | |
| - Verify secrets/configmaps exist | |
| - Check resource limits | |
| ## License | |
| See repository root for license information. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment