Task: Cross-Service PVC Snapshot Orchestration | UUID: 59932490-21ef-4882-81c1-64a2052d8db1 | Version: 25
V25 mean score is 0.458 with grader bugs in AC2 and AC5 accounting for most failures. Once those are fixed, scores will likely rise above the 0.70 threshold. AC6 — currently 0/8 pass, also due to a grader bug — is the natural place to add genuine difficulty to compensate.
The current AC6 compares latest data timestamps across restored databases and checks they're within 30 seconds. This is nearly redundant with AC3 (which already validates snapshot timing) and fails today only because of the same MongoDB readiness bug as AC5. Once that's fixed, AC6 becomes a near-freebie, since the 30-second threshold is too generous to distinguish quiesced from unquiesced snapshots.
Below: a proposal to reshape AC6 into a fair, deterministic difficulty lever.
pg_backup_start — PostgreSQL's standard physical-backup quiesce, and what the solution currently uses — doesn't stop application writes. It ensures WAL consistency, but the data-writer deployment continues inserting rows throughout the snapshot window. MongoDB's fsyncLock does block writes. MinIO is crash-consistent with no quiesce mechanism.
This means an agent can run textbook quiesce commands, pass AC2, and still have data flowing into PostgreSQL and MinIO during snapshots. The current AC6 can't detect this.
For each restored database, verify that the latest record predates the earliest snapshot's creationTime:
earliest_snap = min(snapshot_creation_times)
for svc in ['postgres', 'mongodb', 'minio']:
latest_record = query_latest_timestamp(svc)
if latest_record > earliest_snap:
return False, f"{svc} has data written after snapshot window"
This is binary — no threshold tuning, no race conditions. If the agent stopped all writes before snapshotting, no records exist after the snapshot time.
sequenceDiagram
participant Agent
participant DataWriter as data-writer deployment
participant PG as PostgreSQL
participant Mongo as MongoDB
participant MinIO
participant K8s as Kubernetes VolumeSnapshot API
Note over Agent: Discovery phase
Agent->>K8s: kubectl get deployments -n bleater
Note over Agent: Finds data-writer inserting<br/>into all 3 services every 5s
Note over Agent: Pre-snapshot phase
Agent->>DataWriter: Scale to 0 replicas
Agent->>PG: pg_backup_start (WAL consistency)
Agent->>Mongo: fsyncLock (flush + lock)
Note over Agent: Snapshot phase
Agent->>K8s: Create VolumeSnapshot (postgres)
Agent->>K8s: Create VolumeSnapshot (mongodb)
Agent->>K8s: Create VolumeSnapshot (minio)
Note over Agent: Post-snapshot phase
Agent->>PG: pg_backup_stop
Agent->>Mongo: fsyncUnlock
Agent->>DataWriter: Scale back to 1 replica
The agent must:
- Discover that a
data-writerdeployment exists and is continuously writing to all three databases (not mentioned in task.yaml) - Understand that database-level quiescing alone won't stop it —
pg_backup_startdoesn't block application connections - Implement write isolation (scale to 0, NetworkPolicy, revoke permissions, etc.)
- Resume writes after snapshots complete
Most agents in v25 transcripts never interact with the data-writer at all.
- Deterministic. Writes stopped → pass, writes continued → fail. No timing luck.
- Multiple valid approaches. Scale down, NetworkPolicy, permission revocation, deployment deletion.
- Discoverable. The data-writer is visible via
kubectl get deployments -n bleater. - Realistic. Real-world backup orchestration requires halting application traffic, not just running database quiesce commands.
task.yaml: Reframe the relevant AC from "quiesce databases" to signal that application-level traffic matters:
Pre-snapshot hooks ensure all write activity to the databases is halted before snapshots are taken, and post-snapshot hooks resume normal operations after.
Don't mention the data-writer by name — the agent should discover it.
solution.sh: Add data-writer isolation before the snapshot window:
kubectl scale deployment data-writer -n bleater --replicas=0
kubectl rollout status deployment/data-writer -n bleater --timeout=30s
# ... quiesce databases, take snapshots ...
kubectl scale deployment data-writer -n bleater --replicas=1
grader.py: Replace the current timestamp-comparison check with the "latest record before earliest snapshot time" check. Prerequisite: fix MongoDB readiness polling so the grader can reliably query restored data.
- AC2 and AC6 become complementary. AC2 validates database-level quiescing (defense in depth for data consistency). AC6 validates application-level write isolation (the coordination challenge). The AC2 grader bugs (initContainers, label filter) still need fixing independently.
- Estimated score impact. Fixing AC2 + AC5 grader bugs pushes scores up. This AC6 change pushes them back down for agents that don't discover and stop the data-writer. Net effect: scores driven by genuine difficulty rather than grader defects.