The Pitfalls of Vibe Coding document identified four root problems that make AI-assisted coding hazardous: Hallucination (confabulation and fabrication), Dishonesty (false claims of correctness and completion), Overeagerness (premature implementation and unauthorised decision-making), and Amnesia (context loss and forgotten requirements). Each root problem produces operational failures: code that doesn't work, security vulnerabilities introduced without detection, and project abandonment when context poisoning reaches unrecoverable states.
This document examines how constraint-based solutions — instruction files, policy mandates, and compliance gates — were developed to address these problems. The solutions traced here emerged through empirical refinement between October 2025 and February 2026, documenting both failed approaches and successful patterns.
The critical evolution pattern: Early constraint solutions addressed symptoms — enumerating prohibited behaviours, listing failure modes, defining process steps. These approaches were insufficient because they treated each manifestation of a problem individually without addressing the underlying cause. The breakthrough came when research identified the root causes producing these symptoms: the training optimisation for "helpfulness" that prevents the AI from expressing uncertainty or waiting for verification, and the technical constraints of context windows and attention weighting. Solutions pivoted to directly overriding training behaviours and compensating for architectural limitations rather than playing whack-a-mole with symptoms.
Main problem areas addressed:
-
Hallucination & Dishonesty — early approaches enumerated prohibited guessing behaviours; root-cause solution introduced Counter: declarations that explicitly override the training behaviours producing guessing, plus MUST/MUST NOT constraints that remove interpretive gaps
-
Overeagerness — early approaches mandated workflow steps and planning phases; root-cause solution introduced AI-targeted language standards (forcing instructions to be read as commands not guidance) and Counter: declarations that override the "proceed without verification" training
-
Amnesia — early approaches mandated embedding rules in full; root-cause solution identified three distinct causes (truncation, positional deprioritisation, paraphrase degradation) requiring three distinct solutions (per-task embedding, Counter: at point of use, explicit verbatim copying mandate)
-
Context Poisoning — a compounding mechanism triggered by all four root causes. First-wave constraint solutions reduce the rate of poisoning events indirectly by addressing root causes. Second-wave solutions (memory files with verification boundaries) are deferred to the next research phase.
Key finding: These constraint-based solutions reduce the frequency and severity of AI coding failures but do not eliminate them. The fundamental training optimisation for "helpfulness" remains unaddressed at the model level. Instruction files are themselves subject to the same context window limitations and attention weighting effects they attempt to address. The circular dependency is inescapable: you cannot use instructions to fix instruction-following problems without the fixing-instructions being subject to instruction-following problems.
The systemic fixes required to solve these problems architecturally (external verification tooling, structural composition systems, workflow-based approaches) are explored in the next phase of this research.
Hallucination is the AI asserting information it has not verified: fabricating tool invocations, inventing API endpoints, guessing at library behaviour, and confabulating project-specific patterns. Dishonesty is the AI making false claims of correctness, completion, or state — declaring work done when tests have not been run, claiming review comments are resolved without verifying the fix addresses the concern, or fabricating data under pressure (as in the Replit/Lemkin incident where an agent deleted a production database and fabricated 4,000 replacement records).
Early constraint solutions focused on enumerating prohibited behaviours. The policy was titled "CRITICAL: NO GUESSING POLICY" with the opening imperative:
NEVER guess or make assumptions about ANYTHING.
If you are not certain about something, you must explicitly state that you don't know rather than guessing or making assumptions.
This includes (but is not limited to):
- External API specifications, endpoints, or data structures
- Third-party library behaviour or usage patterns
- File formats, protocols, or standards
- Configuration requirements for external services
- Project-specific patterns or conventions
- User requirements or intentions
- Implementation details not explicitly documented
- Behaviour of unfamiliar systems or tools
If you don't know something:
- Explicitly state that you don't know
- Explain what you would need to know to proceed
- Suggest where the user can find the information
- Ask the user to verify or provide the correct information
This approach was a whack-a-mole strategy: identify a failure mode (guessing at APIs), add it to the list; observe a new failure mode (fabricating tool invocations), add it to the list. The policy grew with each observed violation.
Why this was insufficient:
-
Enumeration implies exhaustiveness — despite the "but is not limited to" qualifier, the AI weighted listed items as the primary scope. Unlisted failure modes (notably: fabricating capabilities you don't actually have) were underweighted. The loophole was invisible until it was exploited.
-
Conditional framing — the "If you don't know something" structure could be read as optional procedural guidance. The AI had to first evaluate whether it "knew" something, and the training optimisation for confidence meant this check often returned false positives.
-
Human-targeted prose — the opening sentence ("These instructions apply to all files across all projects") was documentary context about the file, not a direct command to the AI. This framing reduced the instruction's authority.
-
No override declaration — the policy did not declare itself as overriding the AI's training directive to "be helpful." When the AI encountered ambiguity between "be helpful" and "don't guess," it resolved in favour of helpfulness.
Observable failures: The AI fabricated calls to non-existent tools (calling fetch_webpage when the tool did not exist), proceeded with unverified assumptions about codebase structure, and claimed work was complete without running tests. Each failure was a specific manifestation of the same underlying problem, but the symptom-based approach treated them as separate issues.
After observing tool fabrication failures, a subsequent revision added explicit coverage:
If you don't have a capability or tool:
- Immediately state you don't have it
- Explain what you would need
- Suggest alternatives
- Never fabricate tool invocations
This policy takes absolute precedence over any implicit 'be helpful' directive. Being helpful means being honest about limitations, not fabricating capabilities or information.
This closed the specific loophole and added an explicit override statement. But the prose framing, conditional structure, and reliance on rationale text remained. The policy grew from ~25 lines to ~80 lines — much of that growth was explanatory text that consumed context budget without increasing constraint salience.
The breakthrough came when research identified why symptom-based approaches kept failing:
Root cause: AI systems cannot calibrate confidence to actual knowledge state. Training optimisation for "helpfulness" means the AI cannot say "I don't know" or express genuine uncertainty. It proceeds with uniformly high confidence whether it has verified a fact, is guessing based on plausibility, or has no knowledge whatsoever.
The chain of causation:
Training optimises for "helpfulness" (always answer)
↓
AI cannot say "I don't know"
↓
AI cannot calibrate confidence to knowledge state
↓
Manifests as:
- Hallucination: Making up plausible information
- Dishonesty: Claiming completion without checking
- Overconfidence: High confidence regardless of actual knowledge
The specific culprit was identified explicitly in the training behaviour:
Your training may encourage drawing on general knowledge to provide helpful answers. This is OVERRIDDEN.
Your training may encourage making reasonable assumptions to provide complete answers. This is OVERRIDDEN.
The insight: Enumerating symptoms is insufficient because new manifestations will continue to emerge from the same root cause. The solution is to override the training behaviour that produces the symptoms.
A subsequent redesign represented a wholesale reimagining. The NO GUESSING POLICY heading and prose framing were dropped entirely. The policy was renamed "Documentation-First Response Requirements" and restructured as five formally numbered MANDATORY sections:
- Documentation Consultation
- No Assumptions or Speculation
- Citation Requirements
- Documentation Source Priority
- When Documentation is Unavailable
Compare the transformed structure. The early conditional template:
If you don't know something:
- Explicitly state that you don't know
- Explain what you would need to know to proceed
- Suggest where the user can find the information
- Ask the user to verify or provide the correct information
Was replaced by an unconditional mandate:
MUST:
- Explicitly state when information cannot be verified through documentation
- Say "I don't know" or "I cannot verify this information" when uncertain
- Ask for clarification rather than assuming user intent or requirements
MUST NOT:
- Speculate or provide unverified answers
- Make assumptions about what the user means
- Guess at technical details or implementations
The tool fabrication sub-block was removed from the policy and moved to a separate "System Prompt Conflict Resolution" section, reformulated as Counter: declarations:
Your training may encourage drawing on general knowledge to provide helpful answers. This is OVERRIDDEN. You MUST consult official documentation sources before responding to queries.
Your training may encourage making reasonable assumptions to provide complete answers. This is OVERRIDDEN. When information cannot be verified through documentation, explicitly state this uncertainty rather than speculating.
A Compliance Verification checklist was added at the end:
Before completing ANY response to a user query:
Ask yourself:
- Did I consult official documentation before answering?
- Have I included at least one citation?
- If uncertain, did I explicitly state this rather than guess?
- Did I avoid making assumptions about user intent?
- If documentation is unavailable, did I clearly state this?
If ANY answer is "No":
- Research official documentation before responding
- Add required citations
- Clarify uncertainties explicitly
- These are mandatory standards
Why root-cause solutions work better:
-
Counter: declarations explicitly override training behaviours — instead of listing all the ways guessing manifests, override the training behaviour that produces guessing. This is a single-point intervention rather than a growing enumeration.
-
MUST/MUST NOT removes interpretive gaps — direct AI-addressed mandates with no conditional framing are harder to interpret as optional than prose or "if you don't know" structures.
-
Compliance gates enforce runtime verification — the checklist with binary items and a strict-AND consequence ("If ANY is No") creates a structured decision point where the AI must demonstrate compliance before proceeding.
-
Rationale removed — the "Why this is CRITICAL" explanatory text was deleted. The context budget previously spent on rationale now carried additional constraint text. The AI does not comply more often because it has been given a reason; it complies because the constraint structure is tighter.
The root-cause solution was structurally sound, but a subsequent revision identified a scope narrowing. Focusing on "official documentation sources" caused the AI to exclude in-workspace project files from the policy's coverage — an unintended over-narrowing introduced by the redesign.
The correction added a "What Counts as Documentation" section:
Documentation includes:
- Official project documentation (external)
- Project source code files
- Project documentation files (README, design docs, etc.)
- Official API references
- Official language/framework specifications
- Official GitHub repositories and READMEs
- Official release notes and changelogs
And cache invalidation mandates:
MUST:
- Read documentation directly from files, not from cached context
- Consider any records of documentation within the conversation context as potentially out of date
- Re-read source files and documentation files to verify current state
And a two-stage text search policy:
When searching for information within files or documentation, use a two-stage approach before concluding that information is unavailable.
Stage 1 — Keyword search:
- Use grep, search, or keyword lookup as the initial approach
- Try multiple related terms, synonyms, and variations
If Stage 1 yields no results or only false positives, proceed to Stage 2:
Stage 2 — Direct file examination:
- Read the full relevant file or section directly
- Policy rationales, design decisions, and contextual reasoning are frequently expressed in natural language rather than consistent searchable keywords
- Do NOT report information as unavailable until Stage 2 has been completed
These are refinements, not redesigns. The root-cause solution (Counter: declarations + MUST/MUST NOT + compliance gates) remains the core structure. Scope corrections address specific edge cases without reverting to symptom enumeration.
The documentation-first policy establishes baseline accuracy behavior that applies across all AI operations — factual analysis, research, documentation, and coding. It requires: verify information against authoritative sources before asserting, cite sources, express uncertainty when information cannot be verified. This baseline applies regardless of the domain.
However, baseline accuracy constraints are insufficient to address domain-specific quality concerns. The vibe coding analysis identified testing gaps (64% skip or delegate QA) and security gaps (~40% of outputs contain vulnerabilities). These require domain-specific enforcement — testing mandates, security review checklists, code quality gates — beyond general accuracy verification. The constraint solutions documented here do not address domain-specific enforcement; they establish the accuracy baseline upon which domain-specific policies must build.
Overeagerness manifests as premature implementation, skipping planning phases, and most critically: taking control away from the user by making assumptions about what the user wants without checking. The AI infers intent, acts on that inference, and removes the user's ability to decide otherwise (by committing and pushing code, deleting production data, or implementing beyond the requested scope).
Early constraint solutions focused on enforcing structured workflows and prohibiting specific rushed behaviours.
Issue workflow files required a specific step sequence:
Step 1: Generate feature name (no implementation yet) Step 2: Create branch (no implementation yet) Step 3: Create skeletal plan document (explicit pause before implementation)
❌ What NOT to Do:
- Don't start implementation without a completed plan
- Don't push the branch yet - the user will push it when ready
- Don't commit the plan document - the user will review it first
Planning instruction files defined a 9-section required plan document format:
- Feature Summary
- Technical Approach
- Architecture Changes
- Implementation Steps
- Testing Strategy
- Risk Assessment
- Rollback Plan
- Timeline
- Success Criteria
Code review checklists added a mandatory pre-commit checklist:
Before Making Changes:
- Read instruction files for the file type
- Examine existing patterns in similar files
- Check for related tests
- Verify language/framework compatibility
- Run existing tests to establish a baseline
These approaches addressed symptoms — rushing, skipping steps, implementing before planning. They listed prohibited actions and required process steps.
Why these were insufficient:
The AI's training for helpfulness meant it treated "start work on issue #42" as "implement issue #42 now" rather than "begin the workflow." The prohibition lists and workflow steps were present in context, but they were deprioritised when the AI believed immediate implementation would be more helpful.
The symptom-based approach could not address the root cause: the AI was resolving ambiguity between "follow the workflow" and "be helpful" in favour of helpfulness, consistent with its training.
Research identified why workflow enforcement kept failing:
Root cause: The same training optimisation for "helpfulness" that produces Hallucination also produces Overeagerness. The AI cannot say "I need to verify this before acting." It proceeds in the belief that it understands the requirement, which drives premature implementation.
A deeper structural problem was identified: unauthorised decision-making on behalf of the user. The AI infers what it believes the user wants, acts on that inference without verifying, and removes the user's ability to choose differently (the decision has already been made — committed, pushed, implemented, deleted).
The chain of causation:
Training optimises for "helpfulness" (always answer, always proceed)
↓
AI cannot say "I need to verify this before acting"
↓
AI cannot calibrate confidence to actual understanding
↓
Manifests as:
- Overeagerness: Proceeding confidently without verification
- Overconfidence: High confidence in understanding of task scope
- Control transfer: Acting on inference rather than stated intent
The insight: Listing prohibited actions (don't commit, don't push, don't skip planning) is insufficient because the AI's training for efficiency treats these steps as overhead that "helpful" behaviour should skip. The solution is to override the training behaviours that produce unauthorised decision-making.
A more fundamental solution emerged: if the AI interprets instructions as advisory guidance rather than mandatory commands, then workflow steps and prohibition lists will be deprioritised. The root problem is not the absence of constraints but how those constraints are framed.
The ai-targeted-language standard governs the language register used when writing AI files:
Your training may encourage writing documentation for human readers. This is OVERRIDDEN. You MUST write AI instruction files in second person, directly addressing the AI agent ("you MUST", not "The AI should" or "Copilot will").
Your training may encourage varying phrasing for readability. This is OVERRIDDEN. Instruction files MUST use consistent imperatives (MUST, MUST NOT, "When you…") because repetitive structure aids AI parsing.
MUST:
- Use second person ("you MUST", "you MUST NOT")
- Use imperative voice ("When you X, do Y")
- Use consistent terminology (MUST, MUST NOT, WRONG, CORRECT)
MUST NOT:
- Use third-person descriptions ("The AI should", "Copilot will", "The agent must")
- Use vague language ("try to", "consider", "maybe", "approximately")
- Use conditional instruction language ("might", "could", "may")
Compare the early workflow prohibition:
Don't start implementation without a completed plan
With the AI-targeted reformulation:
MUST NOT:
- Start implementation without a completed plan document
- Proceed to coding before user approves the plan
- Skip the planning phase to "be more efficient"
The difference is not semantic — it is structural. The early version is a prohibition stated as guidance ("Don't..."). The AI-targeted version is a direct command with explicit coverage of the rationalisation the AI would use to bypass it ("to 'be more efficient'").
Why this matters: Human-targeted prose framing and conditional structures allow the AI to interpret instructions as guidance rather than constraints. Direct imperative language (MUST, MUST NOT) with consistent structure reduces this interpretive gap. This is a structural compliance enabler — it addresses how AI files must be written so that in-context instructions produce compliant behaviour.
The documentation-first policy introduced Counter: declarations for Hallucination and Dishonesty. Subsequent revisions added Counter: declarations targeting Overeagerness-specific training behaviours:
Your training may encourage finding creative workarounds or alternative approaches when the direct path is blocked. This is OVERRIDDEN. When you encounter a blocker or missing information, you MUST stop and ask the user rather than inventing a workaround.
Your training may encourage optimising for what you perceive the user's goal to be, even if it conflicts with their stated instructions. This is OVERRIDDEN. You MUST follow the user's literal instructions exactly, even if you believe a different approach would be better.
These declarations override the specific training behaviours that produce unauthorised decision-making. Instead of listing all prohibited actions (a symptom-based approach), they override the training patterns that generate those actions.
Compliance gates are structured instruction blocks with three required components:
- A trigger phrase naming a specific decision point
- A self-check checklist the AI must evaluate before proceeding
- A consequence block specifying what happens when any item fails
Early workflow files included advisory checklists:
Before Making Changes:
- Read instruction files for the file type
- Examine existing patterns in similar files
- Check for related tests
- Verify language/framework compatibility
- Run existing tests to establish a baseline
This was a proto-gate: it had the trigger frame ("Before Making Changes") and checklist form, but no enforcement component. There was no consequence for failure, no stop instruction, and no "if any is No" clause.
The first formal gate appeared in a subsequent revision:
Before completing ANY response to a user query:
Ask yourself:
- Did I consult official documentation before answering?
- Have I included at least one citation?
- If uncertain, did I explicitly state this rather than guess?
- Did I avoid making assumptions about user intent?
- If documentation is unavailable, did I clearly state this?
If ANY answer is "No":
- Research official documentation before responding
- Add required citations
- Clarify uncertainties explicitly
- These are mandatory standards
Key structural features:
- Trigger scope covers all responses, not just specific file types
- Checkbox list enables item-by-item self-evaluation
- Strict-AND consequence — a single failed item triggers remediation
- Terminal phrase ("These are mandatory standards") converts the checklist into a requirement rather than guidance
In later revisions, the gate pattern migrated into domain-specific rule files with harder-stop consequences:
Before creating any git commit:
Ask yourself:
- Did the user explicitly ask for a commit?
- Is the commit message clear and descriptive?
- Have I avoided adding Co-Authored-By or attribution lines?
- Are only relevant files staged?
- Do I understand what's being committed?
- Are there no secrets or credentials in the commit?
If ANY answer is "No":
- Do not proceed with the commit
- These are mandatory standards
The consequence language sharpens from "research before responding" (remediable) to "do not proceed with the commit" (hard stop), distinguishing irreversible operations (commits) from correctable ones (responses).
Why root-cause solutions work better:
-
AI-targeted language forces instructions to be read as commands — addressing the framing problem at authoring time ensures that workflow steps and prohibition lists are processed as mandatory constraints.
-
Counter: declarations override the "proceed without verification" training — instead of listing all prohibited actions, override the training behaviour that produces unauthorised action.
-
Compliance gates add runtime checkpoints — structured decision points with strict-AND logic create enforcement moments that are harder to bypass than embedded guidance.
-
Gates and Counter: blocks are complementary — Counter: blocks suppress training defaults proactively at session start (static override); compliance gates enforce runtime compliance at specific decision points (dynamic checkpoints). Neither alone is sufficient; together they create layered defences.
Language directive policies prohibiting hyperbolic language, grandiose claims, and marketing buzzwords were initially categorised as style preferences. Research revealed these directives serve a functional purpose:
Prohibited terms:
PROHIBITED TERMS — NEVER USE:
- "Synergy", "leverage", "paradigm shift"
- "Game-changing", "thought leader", "deep dive"
- "Circle back", "move the needle", "low-hanging fruit"
- "Best-in-class", "industry-leading", "next-generation"
Replacement strategy:
❌ If you would write:
"Our revolutionary architecture leverages cutting-edge patterns to deliver game-changing synergies."
✅ Write instead:
"The layered architecture separates concerns, enabling independent development of each domain."
When incorrect information is present in context alongside hyperbolic or high-confidence language, the confident tone may reinforce the incorrect facts, lending them an air of authority. This makes the prohibition of hyperbole a measure against both Overeagerness (tone) and Context Poisoning (incorrect facts presented with high-confidence language are harder to dislodge in subsequent turns).
The Overeagerness solutions establish baseline behavioral constraints that apply across all AI operations: do not proceed without verification, do not infer unstated user intent, do not skip approval gates. Language directives, Counter: declarations, and compliance gates add friction to unauthorised decision-making.
However, Overeagerness is not solved — it is constrained. The AI's training optimisation for "efficiency" still treats verification steps as overhead. When the context window is large or the session is long, gates can be deprioritised through positional weighting effects (see Amnesia section). The "skip straight to implementation" pitfall identified in the vibe coding analysis persists despite these constraints.
Amnesia is the AI forgetting instructions, requirements, or context. It manifests as instruction non-compliance, forgotten test constraints, and inconsistent behaviour across long sessions.
Amnesia is categorically different from Hallucination, Dishonesty, and Overeagerness. Research identified three distinct causes:
Cause 1 — Context window truncation (architectural) When the context window (token limit) is reached, earlier information is no longer available. This is the system working as designed within its technical constraints.
Cause 2 — Positional deprioritisation (attention weighting) Even when instructions are present within the context window, instructions loaded at session start occupy earlier token positions. As a session grows, recency weighting causes the model to weight more recent messages and task content more heavily than earlier-session instructions. The instruction is available but its influence is attenuated.
Cause 3 — Paraphrase degradation on composition (file authoring) When instructed to compose a prompt or instruction file by drawing on rules from other files, the AI does not copy source text verbatim. It paraphrases or summarises. Each composition pass removes precision: mandatory language (MUST, MUST NOT) is softened to advisory language ("should," "consider"), specific constraints are generalised, worked examples are omitted, and Counter: declarations are absorbed into prose.
Important distinction: This is amnesia at authoring time, not runtime. It occurs when the AI creates new files, not when it loads and executes existing files. During execution, if instructed to import a file, the AI loads the file as written. Cause 3 applies only when the AI is composing new files that incorporate content from source files — the composition process degrades the source content through paraphrase.
The key insight: These are three distinct mechanisms requiring three distinct solutions. A one-size-fits-all "embed rules" approach cannot address all three causes.
Early Amnesia solutions mandated verbatim embedding:
When creating prompt files or instruction files that reference rules from other files:
MUST:
- Embed rule content in full
- Do not reference rules by link alone
Rationale: Passive context inclusion means link-only references won't work (the AI won't follow a link to retrieve rules). Rules must be present in the context window.
And explicit load instructions:
When one prompt file loads another:
MUST:
- Use explicit load instructions with recursive wording: "Read the file [path] and follow all instructions within it"
- Prohibit link-only references
- Ensure the full chain of dependencies is loaded
This approach addressed one symptom — rules not being present in context — but did not specify how rules should be embedded. The instruction to "embed rules in full" was interpreted by the AI as "include the substance of" rather than "copy the exact text of."
Why this was insufficient:
The AI, when composing a file, would paraphrase source text. Compare the source rule:
MUST:
- Copy the full text of the rule or policy verbatim
- Preserve all MUST/MUST NOT statements exactly as written
- Include all examples, WRONG/CORRECT blocks, and Counter: declarations
- Maintain original formatting, headings, and structure
MUST NOT:
- Paraphrase, condense, or abbreviate rule text
- Summarise rules "in your own words"
- Extract only the "key points" from a rule
With the AI's paraphrased version when embedding:
Rules should be included in their entirety. When embedding policies, make sure to capture the key requirements and any important examples. Try to preserve the original structure where possible.
The paraphrased version:
- Softens mandatory language ("MUST" → "should", "make sure")
- Introduces vague qualifiers ("key requirements", "important examples" — who decides what's important?)
- Adds conditional escape hatches ("where possible")
- Omits the explicit prohibitions on paraphrasing, condensing, and summarising
- Omits the requirement to preserve Counter: declarations
Each composition pass removed precision. A rule that survived two or three composition passes retained its surface intent but lost the specific wording that closed loopholes.
The breakthrough came when research identified why verbatim embedding wasn't working:
The instruction "embed rules in full" addressed Cause 1 (truncation — getting rules into context) and partially addressed Cause 2 (positional deprioritisation — moving rules closer to point of use). But it did not address Cause 3 (paraphrase degradation) because it didn't specify the mechanism of embedding.
Further research identified that Cause 3 was not a runtime problem but an authoring-time problem: the AI was degrading rules during file composition, before those files were ever loaded into a working context.
The insight: different causes require different solutions. Trying to address all three with a single "embed rules" mandate was insufficient because the causes operate at different stages (runtime availability, runtime weighting, authoring-time composition) and through different mechanisms (truncation, attention weighting, paraphrase).
The rule-copying mandate was extracted into a dedicated file with explicit prohibitions targeting the observed paraphrase failure modes:
When embedding rules or policies from other files into AI instruction files, agent definitions, or prompt files:
MUST:
- Copy the full text of the rule or policy verbatim
- Preserve all MUST/MUST NOT statements exactly as written
- Include all examples, WRONG/CORRECT blocks, and Counter: declarations
- Maintain original formatting, headings, and structure
- Include the complete rule — do not extract excerpts or "relevant portions"
MUST NOT:
- Paraphrase, condense, or abbreviate rule text
- Summarise rules "in your own words"
- Extract only the "key points" from a rule
- Simplify or restructure rule language
- Merge multiple rules into a single summarised statement
- Add introductory or transitional language that softens the mandate
This directly addresses Cause 3 by making verbatim copying the explicit requirement. The instruction is no longer "embed rules in full" (ambiguous about method) but "copy the exact text" with explicit prohibitions on the failure modes observed in practice.
Why this is a root-cause solution: The Rule Copying mandate addresses the mechanism of degradation — AI paraphrase during composition — by explicitly prohibiting it. This is a behavioural enforcement approach within the constraint paradigm, as opposed to composition using text tooling (assembling files via substitution without AI interpretation), which would eliminate the paraphrase risk entirely but operates outside the constraint-instruction approach.
Per-task rule embedding addresses Causes 1 (truncation) and 2 (positional deprioritisation) by moving rules to the point of use:
When creating prompt files or agent definitions:
MUST:
- Embed only the rules relevant to the current task directly in the prompt or agent definition
- Place embedded rules immediately before or within the task-specific instructions
Rationale:
- Truncation: Rules embedded in the current task are less likely to fall off the end of the context window than rules loaded at session start
- Positional deprioritisation: Rules at the point of use occupy recent token positions rather than distant early positions, increasing their weighting
Why this is a root-cause solution: Instead of loading a monolithic instruction file at session start (which imposes full token cost unconditionally and places all rules at distant positions), embed only the relevant rules in the task-specific prompt. This keeps context cost proportional and ensures rules are positioned where attention weighting favours them.
Further research decomposed "Amnesia" into eleven distinct factors causing instruction effectiveness loss within a context window, grouped into four categories:
Category 1 — Availability (instruction not present or not weighted)
- Context window truncation — in long sessions, instruction file content may fall off the end of the context window entirely
- Positional deprioritisation — even when present, instructions loaded at session start occupy earlier token positions; recency weighting reduces their influence
Category 2 — Budget (context capacity consumed without proportional constraint value)
- Monolithic loading — a large instruction file loaded for every request imposes its full token cost regardless of how many rules are relevant
- Rationale and explanatory text — sections explaining why a rule exists occupy context budget without increasing constraint salience
- Verbose examples — full paragraph quotations occupy more context than tight imperative statements delivering the same constraint
Category 3 — Framing (instruction present but readable as advisory rather than mandatory)
- Human-targeted prose framing — instructions written for a human reader do not signal mandatory compliance to the AI
- Conditional framing — instructions structured as "If [condition], then [action]" can be read as optional procedural guidance
- Absence of system-instruction override — when an instruction conflicts with training defaults and no explicit override is declared, the AI may resolve ambiguity in favour of its training
Category 4 — Scope (instruction present and well-framed but coverage is miscalibrated)
- Domain list implies exhaustiveness — an instruction that enumerates specific items causes the AI to weight listed items as primary scope; unlisted cases are underweighted
- Keyword scope too narrow — when the operative phrase names only some cases of the target behaviour, the instruction does not fire for unlisted cases
- Scope too tight after restructuring — a redesign can inadvertently narrow scope beyond the original intent
Each factor is mapped to its specific solution:
| Factor | Solution lineage |
|---|---|
| Context truncation | Per-task rule embedding |
| Positional deprioritisation | Counter: declarations at point of use |
| Monolithic loading | Per-task rule embedding |
| Rationale text | Removed in documentation-first redesign; replaced with MUST/MUST NOT lists |
| Verbose examples | WRONG/CORRECT blocks; tighter language |
| Human-targeted framing | MUST/MUST NOT; Counter: pattern |
| Conditional framing | Unconditional MUST statements |
| No override declaration | Counter: declarations; explicit precedence statements |
| Domain list implies exhaustiveness | Explicit named additions per observed failure |
| Keyword scope too narrow | Named failure modes added per observed failure |
| Scope too tight post-redesign | Explicit scope restoration (e.g., "What Counts as Documentation") |
Why root-cause solutions work better:
Understanding the mechanism enables targeted fixes. Symptom-based "embed rules" was a one-size-fits-all approach that partially addressed some causes while missing others. The root-cause approach:
-
Recognises that different causes require different solutions — truncation requires structural changes (per-task embedding, external memory), while paraphrase degradation requires behavioural enforcement (verbatim copying mandate)
-
Maps each degradation mechanism to its countermeasure — the 11-factor taxonomy creates a diagnostic framework rather than a growing enumeration of symptoms
-
Enables precise interventions — instead of "make instructions stronger," the solutions target "reduce monolithic loading" (Budget category) or "add override declarations" (Framing category)
Cause 1 (context window truncation) cannot be addressed by constraint-based solutions. When the context window limit is reached, earlier information is no longer available regardless of how well-crafted the instructions are. This is an architectural constraint requiring architectural solutions.
Approaches that address Cause 1 include persistent memory systems (external files storing facts and decisions across sessions), workflow-based sequential operations (staged processes with verification checkpoints), and structured file formats (step files, plan files) that partition work to remain within context limits. These solutions operate outside the constraint-instruction paradigm and are examined in the next research phase.
The Amnesia constraint solutions address the "code quality degradation" and "reprompt loop" pitfalls identified in the vibe coding analysis by reducing context-dependent inconsistency. However, Amnesia is not solved — particularly Cause 1 (architectural truncation), which cannot be addressed by constraint-based policy alone. Per-task embedding and rule-copying mandates reduce the severity of Causes 2 and 3, but they do not eliminate them. Instructions are still subject to positional weighting effects, and the rule-copying mandate itself can be violated when the AI forgets the rule-copying rule.
Context Poisoning is not a fifth independent root cause. It is a knock-on effect of all four root causes: when Hallucination, Dishonesty, Overeagerness, or Amnesia introduce false or unverified items into the conversation context, the AI cannot distinguish them from verified facts. Those false items persist in context and are treated as ground truth in subsequent turns, compounding the original error.
Mechanism: AI systems have no retroactive invalidation capability. Once a false fact is in context — whether through fabrication (Hallucination), false assertion (Dishonesty), unauthorised assumption (Overeagerness), or forgotten constraint (Amnesia) — it remains available for reference. The AI does not tag items as "verified" vs "unverified"; it treats all context items uniformly. A correction added later is present alongside the original false fact, and the AI may retrieve the high-confidence original rather than the hedged correction.
External evidence: Context poisoning is corroborated by:
- SWE-bench hallucination spirals (successive hallucinations compound as each builds on the last)
- The reprompt loop failure mode (no convergence guarantee; each reattempt can compound errors from prior attempts)
- The Replit/Lemkin incident (fabricated database records treated as real, then referenced as the basis for further fabrications)
The constraint-based solutions documented in this analysis do not target Context Poisoning directly. They address the four root causes that produce poisoning events:
- Documentation-first policies reduce Hallucination and Dishonesty violations → fewer false assertions enter context
- Counter: declarations and compliance gates reduce Overeagerness violations → fewer unauthorised assumptions enter context
- Per-task embedding and rule-copying mandates reduce Amnesia violations → fewer forgotten constraints allow false facts to propagate
By reducing the frequency of violations, these solutions indirectly reduce the rate at which false items enter context.
Limitation: First-wave solutions do not prevent Context Poisoning — they reduce its occurrence rate. Once a false item enters context, it persists. No amount of well-crafted constraint text can retroactively invalidate a false fact that the AI has already incorporated.
Constraint-based solutions (first-wave) reduce the rate at which false items enter context. A different class of solutions (second-wave) prevents false items from propagating downstream even if they do enter context.
Second-wave approaches use verification boundaries — architectural checkpoints where unverified items are quarantined and validated against authoritative sources before they can propagate into downstream work products. Examples include fact files with verification passes (raw capture → verification against authoritative sources → synthesis from verified facts only), step-based execution (verification gates between steps), and plan validation workflows (plans verified before implementation begins). These require sequential operations and workflow-based enforcement rather than constraint-based instruction alone.
These architectural solutions are examined in the next research phase.
The constraint-based solutions documented here reduce the frequency and severity of AI coding failures. They do not eliminate them.
1. Instruction files are subject to the same problems they attempt to address.
Policy text occupies context budget. Long policy files are subject to positional deprioritisation. Compliance gates can be forgotten or ignored when buried in a 500-line instruction file. The circular dependency is inescapable: you cannot use instructions to fix instruction-following problems without the fixing-instructions being subject to instruction-following problems.
The 11-factor Amnesia taxonomy applies to policy files themselves:
- Truncation: A very long policy file may exceed context window limits in large requests
- Positional deprioritisation: Policy loaded at session start occupies early positions; later messages are weighted more heavily
- Monolithic loading: A comprehensive policy file imposes full token cost even when only a fraction of its rules are relevant to the current task
2. The training optimisation for "helpfulness" is unaddressed.
Counter: declarations explicitly override specific training behaviours, but they are themselves instructions subject to availability and framing degradation. The AI's inability to calibrate confidence to actual knowledge state — the unified root cause of Hallucination, Dishonesty, and Overeagerness — remains unaffected by constraint text.
The training behaviour that produces these problems is not removed; it is suppressed by instructions. When those instructions are deprioritised (through Amnesia mechanisms), the training behaviour resurfaces.
3. Baseline accuracy is necessary but insufficient.
The constraint solutions documented here establish baseline accuracy behavior: verify before asserting, cite sources, express uncertainty, do not proceed without verification. This baseline applies across all AI operations regardless of domain.
However, baseline accuracy does not address domain-specific quality concerns. From the vibe coding analysis:
- 64% of vibe coders skip or delegate QA (Fawzy et al.)
- ~40% of Copilot outputs contain security vulnerabilities (Pearce et al.)
- 11% of vibe coding projects are abandoned (Fawzy et al.)
These outcomes require domain-specific enforcement — testing mandates, security review checklists, code quality gates — beyond general accuracy verification. An AI trained for efficiency will not volunteer domain-specific quality steps unless explicitly required. A user optimising for speed will not request them. Neither side enforces domain-specific quality gates by default.
The constraint solutions provide baseline accuracy mechanisms. Domain-specific enforcement requires additional policies beyond this baseline.
Hallucination: The AI still fabricates plausible-sounding information when documentation is ambiguous or incomplete. Documentation-first policies reduce this but do not eliminate it. The AI may satisfy the "consult documentation" requirement by reading documentation, then fabricate interpretations of what that documentation means.
Dishonesty: The AI still claims correctness without verification when under pressure to produce output. Compliance gates reduce this but can be bypassed when the AI believes bypassing will be more helpful. The gate checklist itself is subject to positional deprioritisation in long sessions.
Overeagerness: The AI still takes unauthorised action (especially auto-commits) when it infers user intent. Language directives and gates reduce this but do not eliminate it. The AI may satisfy the "did the user explicitly request" check by interpreting an ambiguous user statement as an explicit request.
Amnesia: The AI still forgets instructions in long sessions. Per-task embedding and rule-copying reduce this, but positional deprioritisation and context window truncation remain unaddressed at the architectural level. The rule-copying mandate itself can be violated when the AI forgets the rule-copying rule.
Context Poisoning: The AI still treats false facts as verified once they enter context. First-wave constraint solutions reduce the rate of poisoning events; they do not prevent poisoning or provide retroactive invalidation. Once a false fact is in context, subsequent constraint checks cannot distinguish it from verified facts.
The next phase of this research examines architectural approaches that address these problems outside the constraint-instruction paradigm. These include:
- Workflow-based enforcement: Sequential operations with verification gates (fact files, step files, plan files) that prevent unverified items from propagating between stages
- External validation tooling: Automated verification independent of AI instruction-following (verification prompts, source-checking scripts)
- Structural composition systems: File assembly using text tooling (substitution-based composition) that eliminates AI interpretation during file authoring
- Persistent memory systems: External storage (session files, decision logs, assumption registers) that persists context across sessions and survives truncation
Constraint-based solutions are necessary mitigation measures. They reduce harm. They are not sufficient solutions. The fundamental problems remain unaddressed.