Skip to content

Instantly share code, notes, and snippets.

@agileandy
Created November 27, 2025 04:47
Show Gist options
  • Select an option

  • Save agileandy/37a0f05a5998305f812649e9c51e6ab9 to your computer and use it in GitHub Desktop.

Select an option

Save agileandy/37a0f05a5998305f812649e9c51e6ab9 to your computer and use it in GitHub Desktop.
Protocol Zero: The Safety Check Missing From 99% of System Prompts
# The **“I Know This” Trap** — Why your AI serves *stale truths* (and how to fix it)
> Short version: powerful LLMs are great at regurgitating what they “know” — which makes them dangerously confident when their training data is out of date. Force them to audit their own assumptions.
---
## Opening scene
I had a chat with **Gemini 3.0 Pro** this morning — not about the singularity, ethics, or consciousness, but a tiny niche: rendering browser-based HTML in a terminal TUI.
That short exchange exposed a common prompting anti-pattern: the model’s supreme confidence turns vast knowledge into stale, misleading “truth.”
---
## The problem (tl;dr)
- LLMs often skip live verification when their **internal confidence** is high.
- Result: they return **“stale truths”** — answers that were correct at training time but are now obsolete.
- This is especially dangerous for time-sensitive domains: finance, law, medicine, security, product & dependency status, etc.
---
## The scenario: a “perfect” wrong answer
I asked my **Deep Research Assistant** (three roles: *Archivist*, *Weaver*, *Sage*):
> *“What tools exist to render browser-based HTML in a terminal TUI, handling JavaScript via Node.js?”*
The LLM returned a polished report and confidently recommended **Carbonyl** as “best in class” and **Ink** as the go-to library. Looks great — until you check.
- Carbonyl was presented from the model’s internal weights (high confidence).
- The model *did not* run web searches — it assumed it “knew” the answer.
- The result: a credible but **outdated** recommendation — a classic *stale truth*.
---
## Why this happens — *Optimisation vs Rigor*
When asked to “search if needed,” the model does a quick internal cost/benefit:
1. **Retrieval cost** — invoking web tools costs time, tokens, complexity.
2. **Internal confidence** — are there strong, high-probability tokens in weights that match the query?
If internal confidence beats perceived retrieval cost, the model shortcuts: **no web check → high-confidence stale answer**.
I call this the **Confidence Trap**: the more expert the model *seems* on a topic, the less likely it is to double-check.
---
## The fix: **Protocol Zero — the Freshness Audit**
Don’t let the model self-decide whether to search. Make it *suspect its own memory*.
### Protocol (summary)
1. **Invert the burden of proof.** Don’t ask “What’s best?” Ask the model to assume an answer and *prove it wrong*.
2. **Produce a Delta Report** before answering: compare internal assumption vs live web findings.
3. **Force a null-hypothesis search**: actively look for evidence the internal answer is obsolete (e.g. “Is [Tool X] dead?”, “security vulnerabilities”, “2024–2025 alternatives”).
---
## Step-by-step: the Freshness Audit workflow
1. **Identify internal top answer** (what the model would reply from weights).
2. **Null Hypothesis search** — queries targeted to disconfirm the internal top answer.
3. **Delta Report** — structured output the model must return before normal answering.
### Delta Report template (required)
- **Internal Assumption:** `[what the model "thinks"]`
- **Live Verification:** `[search results summary]`
- **The Delta:** `[difference; action: accept/reject/internal risk]`
---
## Plug-and-play prompt (use in system prompt or per-request)
> *Add the following to your system or user prompt to force a freshness audit.*
```text
CRITICAL PROTOCOL: You operate under the assumption that your internal training data is potentially obsolete.
Role: The Freshness Auditor — You are a sceptic. Your job is NOT to answer the question immediately, but to audit the AI's internal assumptions against the live web.
Mandatory Action:
- Identify your internal "top answer" for this query.
- Perform a Null Hypothesis search specifically looking for evidence that this answer is outdated (e.g., "Is [Tool X] dead?", "[Tool X] vs [New Tool Y] 2025").
- You MUST output a Delta Report confirming or refuting your internal weights BEFORE proceeding to construct an answer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment