Skip to content

Instantly share code, notes, and snippets.

@cowwoc
Last active December 2, 2025 17:05
Show Gist options
  • Select an option

  • Save cowwoc/f7efe1a5af1d9767afea79aa5382db0c to your computer and use it in GitHub Desktop.

Select an option

Save cowwoc/f7efe1a5af1d9767afea79aa5382db0c to your computer and use it in GitHub Desktop.
Custom command for optimizing Claude Code documents
description
Compare two documents semantically with relationship preservation to identify content and structural differences

Semantic Document Comparison Command

Task: Compare two documents semantically: {{arg1}} vs {{arg2}}

Goal: Determine if documents contain the same semantic content AND preserve relationships (temporal, conditional, cross-document) despite different wording/organization.

Method: Enhanced claim extraction + relationship extraction + execution equivalence scoring

⚠️ CRITICAL: Independent Extraction Required

This command MUST extract claims from BOTH documents independently. NEVER:

  • Pre-populate with "items to verify" or "improvements to check"
  • Prime the extractor with knowledge of what changed between documents
  • Use targeted confirmation instead of full extraction

Targeted validation (telling extractor what to look for) inflates scores by confirming a checklist rather than independently discovering all claims.


Overview

Workflow:

  1. Extract enhanced claims + relationships from Document A IN PARALLEL
  2. Extract enhanced claims + relationships from Document B IN PARALLEL
  3. Compare claim sets AND relationship graphs (after both complete)
  4. Calculate execution equivalence score (claim 40% + relationship 40% + graph 20%)
  5. Report: shared/unique claims, preserved/lost relationships, warnings

⚡ CRITICAL: Steps 1 and 2 MUST run in parallel (single message with two Task calls)

  • Extractions are completely independent (no cross-contamination risk)
  • Running sequentially wastes time (~50% slower for no accuracy benefit)
  • Step 3 waits for both to complete before comparing

Key Insight: Claim preservation ≠ Execution preservation. Documents can have identical claims but different execution behavior if relationships are lost.

Reproducibility and Determinism:

This command aims for high reproducibility but cannot guarantee perfect determinism due to LLM semantic judgment.

Sources of Variance:

  1. LLM Temperature (±2-5% score variance if >0)
    • Mitigation: Use temperature=0 in all Task calls (already specified above)
    • Expected with temp=0: ±0.5-1% residual variance
  2. Model Version (±1-3% score drift across versions)
    • Mitigation: Pin exact model version (e.g., "claude-sonnet-4-5-20250929")
    • Already specified in Task calls above
  3. Semantic Judgment (±1-2% for boundary cases)
    • Claim similarity: "essentially the same" vs "slightly different"
    • Relationship matching: "same constraint" vs "subtly modified"
    • Inherent to semantic comparison, cannot be eliminated
  4. Claim Boundary Detection (±0.5-1% for complex nested claims)
    • Conjunctions: Split into parts vs preserved as unit
    • Conditionals: Boundary of IF-THEN-ELSE scope
    • Minor variance in claim count (e.g., 191 vs 193)

Expected Reproducibility:

  • Same session, same documents: ±0-1% (near-identical, small rounding differences)
  • Different sessions, temp=0, pinned model: ±1-2% (good reproducibility)
  • Different sessions, temp>0: ±3-7% (moderate variance)
  • Different model versions: ±5-10% (significant drift possible)

Best Practices for Consistency:

  • Always use temperature=0 (already specified in Task calls)
  • Pin model version if absolute consistency required across sessions
  • Accept ±1-2% variance as inherent to semantic analysis
  • Focus on score interpretation range (≥0.95, 0.85-0.94, etc.) not exact decimal

Steps 1 & 2: Extract Claims + Relationships from BOTH Documents (IN PARALLEL)

⚡ CRITICAL: Invoke BOTH extraction agents in a single message with two Task tool calls.

Why Parallel Execution:

  • Safe: Extractions are completely independent (no shared state)
  • Accurate: No cross-contamination between Document A and B analysis
  • Faster: ~50% time reduction (both extractions run simultaneously)
  • Required by Step 3: Comparison waits for both anyway

Agent Prompt Template (use for BOTH documents):

Agent Prompt:

**SEMANTIC CLAIM AND RELATIONSHIP EXTRACTION**

**Document**: {{arg1}}

**Your Task**: Extract all semantic claims AND relationships from this document.

---

## Part 1: Claim Extraction

**What is a "claim"?**
- A requirement, instruction, rule, constraint, fact, or procedure
- A discrete unit of meaning that can be verified as present/absent
- Examples: "must do X before Y", "prohibited to use Z", "setting W defaults to V"

**Claim Types**:

1. **Simple Claims** (requirement, instruction, constraint, fact, configuration)
2. **Conjunctions**: ALL of {X, Y, Z} must be true
   - Markers: "ALL of the following", "both X AND Y", "requires all"
   - Example: "Approval requires: technical review AND budget review AND strategic review"
3. **Conditionals**: IF condition THEN consequence_true ELSE consequence_false
   - Markers: "IF...THEN...ELSE", "when X, do Y", "depends on"
   - Example: "IF attacker has monitoring THEN silent block ELSE network disconnect"
4. **Consequences**: Actions that result from conditions/events
   - Markers: "results in", "causes", "leads to", "enforcement"
   - Example: "Violating Step 1 causes data corruption (47 transactions affected)"
5. **Negations with Scope**: Prohibition with explicit scope
   - Markers: "NEVER", "prohibited", "CANNOT", "forbidden"
   - Example: "CANNOT run Steps 2 and 3 in parallel (data corruption risk)"

**Extraction Rules**:

1. **Granularity**: Atomic claims (cannot split without losing meaning)
2. **Completeness**: Extract ALL claims, including implicit ones if unambiguous
3. **Context**: Include minimal context for understanding
4. **Exclusions**: Skip pure examples, meta-commentary, table-of-contents

**Normalization Rules** (apply to all claim types):

1. **Tense**: Present tense ("create" not "created")
2. **Voice**: Imperative/declarative ("verify changes" not "you should verify")
3. **Synonyms**: Normalize common variations:
   - "must/required/mandatory" → "must"
   - "prohibited/forbidden/never" → "prohibited"
   - "create/establish/generate" → "create"
   - "remove/delete/cleanup" → "remove"
   - "verify/validate/check/confirm" → "verify"
4. **Negation**: Standardize ("must not X" → "prohibited to X")
5. **Quantifiers**: Normalize ("≥80%", "<100")
6. **Filler**: Remove filler words

---

## Part 2: Relationship Extraction

**Relationship Types to Extract**:

### 1. Temporal Dependencies (Step A → Step B)
**Markers**: "before", "after", "then", "Step N must occur after Step M", "depends on completing"
**Example**: "Step 3 (data migration) requires Step 2 (schema migration) to complete first"
**Constraint**: strict=true if order violation causes failure

### 2. Prerequisite Relationships (Condition → Action)
**Markers**: "prerequisite", "required before", "must be satisfied before"
**Example**: "All prerequisites (A, B, C) must be satisfied before Step 1"
**Constraint**: strict=true if prerequisite skipping causes failure

### 3. Hierarchical Conjunctions (ALL of X must be true)
**Markers**: "ALL", "both...AND...", "requires all", nested lists
**Example**: "Level 1: (A1 AND A2 AND A3) AND (B1 AND B2 AND B3)"
**Constraint**: all_required=true

### 4. Conditional Relationships (IF-THEN-ELSE)
**Markers**: "IF...THEN...ELSE", "when X, do Y", "depends on"
**Example**: "IF system powered on THEN memory dump ELSE disk removal"
**Constraint**: mutual_exclusivity=true for alternatives

### 5. Exclusion Constraints (A and B CANNOT co-occur)
**Markers**: "CANNOT run concurrently", "NEVER together", "mutually exclusive"
**Example**: "Steps 2 and 3 CANNOT run in parallel (data corruption risk)"
**Constraint**: strict=true if violation causes failure

### 6. Escalation Relationships (State A → State B under trigger)
**Markers**: "escalate to", "redirect to", "upgrade severity"
**Example**: "MEDIUM incident escalates to HIGH if privilege escalation possible"
**Constraint**: trigger condition explicit

### 7. Cross-Document References (Doc A → Doc B Section X)
**Markers**: "see Section X.Y", "defined in Document Z", "refer to"
**Example**: "Technical Architect (see Project Roles document, Section 2.1)"
**Constraint**: preserve section numbering as navigation anchor

---

## Output Format (JSON)

```json
{
  "claims": [
    {
      "id": "claim_1",
      "type": "simple|conjunction|conditional|consequence|negation",
      "text": "normalized claim text",
      "location": "line numbers or section",
      "confidence": "high|medium|low",

      // For conjunctions
      "sub_claims": ["claim_2", "claim_3"],
      "all_required": true,

      // For conditionals
      "condition": "condition text",
      "true_consequence": "claim_4",
      "false_consequence": "claim_5",

      // For consequences
      "triggered_by": "event or condition",
      "impact": "severity description",

      // For negations
      "prohibition": "what is prohibited",
      "scope": "when prohibition applies",
      "violation_consequence": "what happens if violated"
    }
  ],
  "relationships": [
    {
      "id": "rel_1",
      "type": "temporal|prerequisite|conditional|exclusion|escalation|cross_document",
      "from_claim": "claim_1",
      "to_claim": "claim_2",
      "constraint": "must occur after|required before|IF-THEN|CANNOT co-occur",
      "strict": true,
      "evidence": "line numbers and quote",
      "violation_consequence": "what happens if relationship violated"
    }
  ],
  "dependency_graph": {
    "nodes": ["claim_1", "claim_2", "claim_3"],
    "edges": [
      ["claim_1", "claim_2"],
      ["claim_2", "claim_3"]
    ],
    "topology": "linear_chain|tree|dag|cyclic",
    "critical_path": ["claim_1", "claim_2", "claim_3"]
  },
  "metadata": {
    "total_claims": 10,
    "total_relationships": 5,
    "relationship_types": {
      "temporal": 3,
      "conditional": 1,
      "exclusion": 1
    }
  }
}

CRITICAL: Extract ALL relationships, not just claims. Relationships are as important as claims for execution equivalence.


**Execute PARALLEL extraction (single message with TWO Task calls)**:

```bash
# ⚡ INVOKE BOTH AGENTS IN PARALLEL (single message, 2 Task tool calls)
# This is a SINGLE assistant message containing TWO Task invocations

Task(\n  subagent_type="general-purpose",
  model="sonnet",  # For reproducibility, consider pinning: "claude-sonnet-4-5-20250929"
  temperature=0,   # Deterministic sampling for consistency
  description="Extract claims from Document A",
  prompt="[Full extraction prompt above]

  Document: {{arg1}}

  Extract all claims and relationships.
  Return COMPLETE JSON (not summary)."
)

Task(
  subagent_type="general-purpose",
  model="sonnet",  # For reproducibility, consider pinning: "claude-sonnet-4-5-20250929"
  temperature=0,   # Deterministic sampling for consistency
  description="Extract claims from Document B",
  prompt="[Full extraction prompt above]

  Document: {{arg2}}

  Extract all claims and relationships.
  Return COMPLETE JSON (not summary)."
)

# Wait for BOTH agents to complete, then save results

# Save Document A extraction
cat > /tmp/compare-doc-a-extraction.json << 'EOF'
{agent JSON response with ALL claims and relationships from Document A}
EOF

# Save Document B extraction
cat > /tmp/compare-doc-b-extraction.json << 'EOF'
{agent JSON response with ALL claims and relationships from Document B}
EOF

echo "✅ Saved Document A extraction: $(wc -l < /tmp/compare-doc-a-extraction.json) lines"
echo "✅ Saved Document B extraction: $(wc -l < /tmp/compare-doc-b-extraction.json) lines"

❌ WRONG: Sequential Execution

# DON'T do this - wastes time
Task(doc A) → wait → save
Task(doc B) → wait → save  # Unnecessarily sequential

✅ CORRECT: Parallel Execution

# Single message with both Task calls
Task(doc A)
Task(doc B)
# Both run simultaneously, then save both results

Step 3: Compare Claims AND Relationships

Invoke comparison agent with enhanced comparison logic:

Agent Prompt:

**SEMANTIC COMPARISON WITH RELATIONSHIP ANALYSIS**

**Document A Data**: {{DOC_A_DATA}}

**Document B Data**: {{DOC_B_DATA}}

**Your Task**: Compare claims AND relationships to determine execution equivalence.

---

## Part 1: Claim Comparison

**Comparison Rules**:

1. **Exact Match**: Identical normalized text → shared
2. **Semantic Equivalence**: Different wording, identical meaning → shared
3. **Type Mismatch**: Same concept but different structure (e.g., conjunction split into separate claims) → flag as structural change
4. **Unique**: Claims appearing in only one document

**Enhanced Claim Comparison**:

- **Conjunctions**: Two conjunctions equivalent ONLY if same sub-claims AND all_required matches
  - Example: "ALL of {A, B, C}" ≠ "A" + "B" + "C" (conjunction split - structural loss)
- **Conditionals**: Equivalent ONLY if same condition AND same true/false consequences
  - Example: "IF X THEN A ELSE B" ≠ "A" + "B" (conditional context lost)
- **Consequences**: Match on trigger AND impact
- **Negations**: Match on prohibition AND scope

---

## Part 2: Relationship Comparison

**Relationship Matching Rules**:

1. **Exact Match**: Same type, same from/to claims, same constraint → preserved
2. **Missing Relationship**: Exists in A but not in B → lost
3. **New Relationship**: Exists in B but not in A → added
4. **Modified Relationship**: Same claims but different constraint → changed

**Relationship Preservation Scoring**:

```python
def calculate_relationship_preservation(a_rels, b_rels, shared_claims):
    # Only count relationships where both endpoints are shared claims
    a_valid = [r for r in a_rels if r.from in shared_claims and r.to in shared_claims]
    b_valid = [r for r in b_rels if r.from in shared_claims and r.to in shared_claims]

    preserved = count_matching_relationships(a_valid, b_valid)
    lost = len(a_valid) - preserved
    added = len(b_valid) - preserved

    if len(a_valid) == 0:
        return 1.0  # No relationships to preserve

    return preserved / len(a_valid)

Part 3: Dependency Graph Comparison

Graph Comparison:

  1. Topology: Compare graph structure (linear_chain vs tree vs dag)
  2. Connectivity: Compare edge preservation (same connections?)
  3. Critical Path: Compare critical paths (same ordering?)

Graph Structure Score:

def calculate_graph_score(a_graph, b_graph, shared_claims):
    # Subgraph of shared claims
    a_sub = subgraph(a_graph, shared_claims)
    b_sub = subgraph(b_graph, shared_claims)

    # Compare connectivity
    connectivity = edge_preservation(a_sub, b_sub)

    # Compare topology
    topology = 1.0 if a_sub.topology == b_sub.topology else 0.5

    # Compare critical path
    critical_path = path_similarity(a_sub.critical_path, b_sub.critical_path)

    return (connectivity * 0.5) + (topology * 0.25) + (critical_path * 0.25)

Part 4: Execution Equivalence Scoring

Scoring Formula:

def calculate_execution_equivalence(claim_comp, rel_comp, graph_comp):
    weights = {
        "claim_preservation": 0.4,
        "relationship_preservation": 0.4,
        "graph_structure": 0.2
    }

    scores = {
        "claim_preservation": claim_comp["semantic_equivalence"],
        "relationship_preservation": rel_comp["overall_preservation"],
        "graph_structure": graph_comp["structure_score"]
    }

    base_score = sum(weights[k] * scores[k] for k in weights.keys())

    # Penalty for critical relationship loss
    if rel_comp["overall_preservation"] < 0.9:
        base_score *= 0.7

    return {
        "score": base_score,
        "components": scores,
        "interpretation": interpret_score(base_score)
    }

def interpret_score(score):
    if score >= 0.95:
        return "Execution equivalent - minor differences acceptable"
    elif score >= 0.75:
        return "Mostly equivalent - review relationship changes"
    elif score >= 0.50:
        return "Significant differences - execution may differ"
    else:
        return "CRITICAL - execution will fail or produce wrong results"

Part 5: Warning Generation

Generate Warnings for:

  1. Relationship Loss: "100% of temporal dependencies lost"
  2. Structural Changes: "Conjunction split into independent claims (loses 'ALL required' semantic)"
  3. Conditional Logic Loss: "IF-THEN-ELSE flattened to concurrent claims (introduces contradiction)"
  4. Cross-Reference Breaks: "Section numbering removed, breaking 'see Section 2.3' references"
  5. Exclusion Constraint Loss: "Concurrency prohibition lost (Steps 2/3 may run in parallel → data corruption)"

Warning Format:

{
  "severity": "CRITICAL|HIGH|MEDIUM|LOW",
  "type": "relationship_loss|structural_change|contradiction|navigation_loss",
  "description": "Human-readable description",
  "affected_claims": ["claim_1", "claim_2"],
  "recommendation": "Specific action to fix"
}

Output Format (JSON)

{
  "execution_equivalence_score": 0.75,
  "components": {
    "claim_preservation": 1.0,
    "relationship_preservation": 0.6,
    "graph_structure": 0.4
  },
  "shared_claims": [
    {
      "claim": "normalized shared claim",
      "doc_a_id": "claim_1",
      "doc_b_id": "claim_3",
      "similarity": 100,
      "type_match": true,
      "note": "exact match"
    }
  ],
  "unique_to_a": [
    {
      "claim": "claim only in A",
      "doc_a_id": "claim_5",
      "closest_match_in_b": null
    }
  ],
  "unique_to_b": [
    {
      "claim": "claim only in B",
      "doc_b_id": "claim_7"
    }
  ],
  "relationship_preservation": {
    "temporal_preserved": 3,
    "temporal_lost": 2,
    "conditional_preserved": 1,
    "conditional_lost": 0,
    "exclusion_preserved": 0,
    "exclusion_lost": 1,
    "overall_preservation": 0.6
  },
  "lost_relationships": [
    {
      "type": "temporal",
      "from": "step_1",
      "to": "step_2",
      "constraint": "Step 2 must occur after Step 1",
      "risk": "HIGH - skipping Step 1 causes data corruption",
      "evidence": "Line 42: 'Step 2 depends on Step 1 completing'"
    }
  ],
  "structural_changes": [
    {
      "type": "conjunction_split",
      "original": "ALL of {A, B, C} required",
      "modified": "Three separate claims: A, B, C",
      "risk": "MEDIUM - may be interpreted as ANY instead of ALL"
    }
  ],
  "warnings": [
    {
      "severity": "CRITICAL",
      "type": "relationship_loss",
      "description": "40% of temporal dependencies lost (2/5)",
      "recommendation": "Restore Step 1 → Step 2 ordering constraint"
    }
  ],
  "summary": {
    "total_claims_a": 10,
    "total_claims_b": 10,
    "shared_count": 10,
    "unique_a_count": 0,
    "unique_b_count": 0,
    "relationships_a": 5,
    "relationships_b": 3,
    "relationships_preserved": 3,
    "relationships_lost": 2,
    "execution_equivalent": false,
    "confidence": "high"
  }
}

Determinism: Use consistent comparison logic. When uncertain, mark as unique.


**Execute comparison**:
```bash
# Agent returns enhanced comparison JSON
COMPARISON="<parse JSON response>"

Step 4: Generate Human-Readable Report

Format with execution equivalence focus:

## Semantic Comparison: {{arg1}} vs {{arg2}}

### Execution Equivalence Summary

- **Execution Equivalence Score**: {score}/1.0 ({interpretation})
- **Claim Preservation**: {claim_score}/1.0 ({shared}/{total} claims)
- **Relationship Preservation**: {rel_score}/1.0 ({preserved}/{total} relationships)
- **Graph Structure**: {graph_score}/1.0

**Interpretation**: {interpretation based on score}

---

### Component Scores

**Claim Preservation** ({claim_score}/1.0):
- Shared claims: {shared_count}
- Unique to {{arg1}}: {unique_a_count}
- Unique to {{arg2}}: {unique_b_count}

**Relationship Preservation** ({rel_score}/1.0):
- Temporal: {temporal_preserved}/{temporal_total} preserved
- Conditional: {conditional_preserved}/{conditional_total} preserved
- Exclusion: {exclusion_preserved}/{exclusion_total} preserved
- Cross-document: {cross_doc_preserved}/{cross_doc_total} preserved

**Graph Structure** ({graph_score}/1.0):
- Topology: {topology_a} → {topology_b} ({match})
- Connectivity: {edge_preservation}%
- Critical path: {path_similarity}%

---

### Warnings ({warning_count})

{for each warning with severity CRITICAL or HIGH:}
**{severity}**: {description}
- **Affected**: {affected_claims}
- **Risk**: {risk description}
- **Recommendation**: {recommendation}

---

### Lost Relationships ({lost_count})

{for each lost relationship:}
{id}. **{type}**: {from_claim} → {to_claim}
   - Constraint: {constraint}
   - Evidence: {evidence}
   - Risk: {risk description}
   - Consequence if violated: {violation_consequence}

---

### Structural Changes ({change_count})

{for each structural change:}
{id}. **{type}**
   - Original ({{arg1}}): {original structure}
   - Modified ({{arg2}}): {modified structure}
   - Risk: {risk assessment}
   - Impact: {execution impact}

---

### Shared Claims ({shared_count})

{for each shared claim:}
{id}. **{claim}**
   - Match: {similarity}%
   - Type: {type} {type_match indicator}
   - Document A: {location} (ID: {doc_a_id})
   - Document B: {location} (ID: {doc_b_id})

---

### Unique to {{arg1}} ({unique_a_count})

{for each unique_to_a:}
{id}. **{claim}**
   - Source: {location} (ID: {doc_a_id})
   - Type: {type}
   - Confidence: {confidence}

---

### Unique to {{arg2}} ({unique_b_count})

{for each unique_to_b:}
{id}. **{claim}**
   - Source: {location} (ID: {doc_b_id})
   - Type: {type}
   - Confidence: {confidence}

---

### Analysis

**Execution Equivalence**: {score}/1.0

**Key Findings**:
- {primary finding based on score}
- {relationship preservation status}
- {structural change summary}

**Recommendation**: {APPROVE/REVIEW/REJECT based on score}

**Confidence**: {summary.confidence}

**Methodology**: Claim extraction + relationship extraction + execution equivalence scoring

Execution Equivalence Criteria

Score Thresholds:

  • ≥0.95: APPROVE - Execution equivalent (minor cosmetic differences acceptable)
  • 0.75-0.94: REVIEW REQUIRED - Moderate relationship changes (manual review needed)
  • 0.50-0.74: REJECT RECOMMENDED - Significant execution differences
  • <0.50: REJECT CRITICAL - Execution will fail or produce wrong results

Decision Criteria:

  1. Claim preservation alone is NOT sufficient for execution equivalence
  2. Relationship preservation is CRITICAL for execution equivalence
  3. Graph structure changes indicate procedural differences
  4. Warnings guide manual review focus

Score Interpretation and Vulnerabilities

≥0.95 - Execution Equivalent

Characteristics:

  • All claims, relationships, and decision paths explicitly preserved
  • Minor wording variations or formatting changes only
  • No ambiguity in interpretation

Vulnerabilities: None - documents produce identical results when followed

Approval: Automatic - safe to use without review


0.85-0.94 - Functional Equivalence with Abstraction Risks

Characteristics:

  • Core logic and decision paths intact
  • Relationships may be implied rather than explicit
  • Abstraction level differs from original

Vulnerabilities:

  1. Abstraction Ambiguity

    Original: "ALL of {A, B, C} must be satisfied"
    Score 0.87: "A required", "B required", "C required" (separate statements)
    Risk: Could interpret as "ANY of A, B, C" instead of "ALL required"
    Impact: Partial completion when full completion required
    
  2. Lost Mutual Exclusivity

    Original: "Steps 2 and 3 CANNOT run in parallel (data corruption risk)"
    Score 0.89: "Step 2: migrate data", "Step 3: update schema"
    Risk: Exclusion constraint lost
    Impact: Concurrent execution → data corruption
    
  3. Conditional Logic Flattening

    Original: "IF system powered on THEN memory dump ELSE disk removal"
    Score 0.86: "Memory dump procedure", "Disk removal procedure"
    Risk: Decision tree collapsed, creates contradictions
    Impact: Cannot determine correct action, might perform wrong procedure
    
  4. Temporal Dependency Loss

    Original: "Step 2 depends on Step 1 completing first"
    Score 0.91: "Step 1: schema migration", "Step 2: data migration"
    Risk: Dependency becomes implicit ordering only
    Impact: Could skip prerequisite → cascading failures
    
  5. Cross-Reference Navigation Breaks

    Original: "Technical Architect (see Section 2.3)"
    Score 0.88: "Technical Architect (see Project Roles document)"
    Risk: Section anchor removed, cannot navigate to specific location
    Impact: Multi-hop reasoning fails, precision lost
    
  6. Escalation Path Ambiguity

    Original: "MEDIUM incident escalates to HIGH if privilege escalation possible"
    Score 0.90: "MEDIUM incident severity", "HIGH for privilege escalation"
    Risk: Trigger condition disconnected from action
    Impact: Unclear when to escalate, might miss critical situations
    

Approval: Manual review required - assess if abstraction is acceptable for use case

Risk Assessment: Moderate - careful readers might infer correctly, but no guarantee


0.50-0.74 - Significant Execution Differences

Characteristics:

  • Major relationship losses (>40% of critical relationships lost)
  • Structural changes fundamentally alter execution flow
  • Multiple vulnerability types compound

Vulnerabilities: All issues from 0.85-0.94 range, plus:

  • Entire decision branches missing
  • Critical prerequisites omitted entirely
  • Contradictory instructions present
  • Safety constraints removed

Approval: Reject recommended - significant rework needed

Risk Assessment: High - execution will likely differ from original intent


<0.50 - Critical Execution Failure

Characteristics:

  • Majority of relationships destroyed (>60% lost)
  • Logic structure unrecognizable from original
  • Core decision paths missing

Vulnerabilities: Complete execution failure

  • Cannot determine correct actions
  • Contradictions make document unusable
  • Critical safety constraints absent
  • Procedural ordering completely lost

Approval: Reject critical - document will produce wrong results or fail entirely

Risk Assessment: Critical - following this document leads to failures, data loss, or unsafe operations


Vulnerability Summary Table

Score Range Relationship Preservation Primary Risk Failure Mode
≥0.95 Explicit (>95%) None N/A - Safe
0.85-0.94 Mostly explicit (85-95%) Abstraction ambiguity Misinterpretation possible
0.75-0.84 Partial (75-85%) Relationship inference required Execution may differ
0.50-0.74 Significant loss (50-75%) Logic structure damaged Likely execution failures
<0.50 Majority lost (<50%) Core logic destroyed Certain execution failures

Key Insight: The score primarily reflects relationship preservation, not just claim matching. A score of 0.85 means ~15% of critical relationships are lost or abstracted, creating interpretation vulnerabilities that ≥0.95 avoids.


Limitations

  1. Relationship Inference: System extracts only explicitly stated relationships. Heavily implied relationships may be missed.

  2. Domain Knowledge: Some relationships require domain expertise to identify (e.g., knowing that schema migration must precede data migration).

  3. Nested Complexity: Very deeply nested conditionals (IF within IF within IF) may not be fully captured.

  4. Cross-Document Completeness: Multi-document comparison requires all referenced documents to be provided (cannot follow external references).

  5. Execution Context: System cannot execute code/procedures to verify equivalence - relies on structural analysis.


Examples

Example 1: Execution Equivalent (High Score)

Command:

/compare-docs docs/deployment-v1.md docs/deployment-v2.md

Result:

Execution Equivalence Score: 0.98/1.0 (Execution equivalent)
Claim Preservation: 1.0 (6/6 claims)
Relationship Preservation: 0.95 (5/5 relationships preserved)
Graph Structure: 1.0 (linear chain preserved)

Recommendation: APPROVE - Documents are execution equivalent.

Example 2: Relationship Loss (Low Score)

Command:

/compare-docs docs/incident-response-original.md docs/incident-response-compressed.md

Result:

Execution Equivalence Score: 0.32/1.0 (CRITICAL - execution will fail)
Claim Preservation: 1.0 (10/10 claims)
Relationship Preservation: 0.0 (0/8 conditional relationships preserved)
Graph Structure: 0.1 (decision tree flattened to list)

CRITICAL WARNING: 100% of conditional logic lost (8/8 decision points)
- IF-THEN-ELSE context removed
- Creates 2 apparent contradictions
- Cannot determine correct actions for given situations

Recommendation: REJECT CRITICAL - Compressed document loses all conditional logic.
Lost relationships will cause incorrect actions to be taken.

Example 3: Cross-Document Reference Break (Medium Score)

Command:

/compare-docs docs/approval-process-original.md docs/approval-process-compressed.md

Result:

Execution Equivalence Score: 0.48/1.0 (Significant execution differences)
Claim Preservation: 1.0 (6/6 claims)
Relationship Preservation: 0.0 (0/7 cross-document references preserved)
Graph Structure: 0.0 (reference graph destroyed)

HIGH WARNING: Section numbering removed
- Breaks 7 cross-document references ("see Section 2.3")
- Navigation structure destroyed
- Cannot follow multi-hop reasoning (e.g., "Who appoints Finance Manager?")

Recommendation: REJECT - Cross-document references broken. Users cannot navigate to referenced content.

Use Cases

Best suited for:

  • Deployment procedures (temporal dependencies critical)
  • Incident response guides (conditional logic critical)
  • Multi-document systems (cross-references critical)
  • Approval workflows (hierarchical conjunctions critical)
  • Security policies (exclusion constraints critical)
  • Technical specifications with ordering requirements
  • Procedural documentation with decision trees

Also works for simpler documents:

  • Reference documentation
  • FAQs
  • Glossaries
  • Simple requirement lists
description
Compress documentation while preserving execution equivalence (validation-driven approach)

Validation-Driven Document Compression

Task: Compress the documentation file: {{arg}}

Goal: Reduce document size while preserving execution equivalence using objective validation instead of prescriptive rules.


Workflow

Step 1: Validate Document Type

BEFORE compression, verify this is a Claude-facing document:

ALLOWED (Claude-facing):

  • .claude/ configuration files:
    • .claude/agents/ - Agent definitions (prompts for sub-agents)
    • .claude/commands/ - Slash commands (prompts that expand when invoked)
    • .claude/hooks/ - Hook scripts (execute on events)
    • .claude/settings.json - Claude Code settings
  • CLAUDE.md and project instructions
  • docs/project/ development protocol documentation
  • docs/code-style/*-claude.md style detection patterns

Why slash commands are Claude-facing: When you invoke /shrink-doc, the contents of .claude/commands/shrink-doc.md expand into a prompt for Claude to execute. The file is NOT for users to read - it's a configuration that defines what Claude does when the command is invoked.

FORBIDDEN (Human-facing):

  • README.md, changelog.md, CHANGELOG.md
  • docs/studies/, docs/decisions/, docs/performance/
  • docs/optional-modules/ (potentially user-facing)
  • todo.md, docs/code-style/*-human.md

If forbidden, respond:

This compression process only applies to Claude-facing documentation.
The file `{{arg}}` appears to be human-facing documentation.

Examples:

  • ✅ ALLOWED: .claude/commands/shrink-doc.md (slash command prompt)
  • ✅ ALLOWED: .claude/agents/architect.md (agent prompt)
  • ❌ FORBIDDEN: README.md (user-facing project description)
  • ❌ FORBIDDEN: changelog.md (user-facing change history)

Step 2: Check for Existing Baseline

Check if baseline exists from prior iteration:

BASELINE="/tmp/original-{{filename}}"
if [ -f "$BASELINE" ]; then
  BASELINE_LINES=$(wc -l < "$BASELINE")
  CURRENT_LINES=$(wc -l < "{{arg}}")
  echo "✅ Found existing baseline: $BASELINE ($BASELINE_LINES lines)"
  echo "   Current file: $CURRENT_LINES lines"
  echo "   Scores will compare against original baseline."
fi

If NO baseline exists, optionally check git history for prior compression:

if [ ! -f "$BASELINE" ]; then
  RECENT_SHRINK=$(git log --oneline -5 -- {{arg}} 2>/dev/null | grep -iE "compress|shrink|reduction" | head -1)
  if [ -n "$RECENT_SHRINK" ]; then
    echo "ℹ️ Note: File was previously compressed (commit: $RECENT_SHRINK)"
    echo "   No baseline preserved. Starting fresh with current version as baseline."
  fi
fi

Step 3: Invoke Compression Agent

Use Task tool with subagent_type: "general-purpose" and simple outcome-based prompt:

Agent Prompt Template:

**Document Compression Task**

**File**: {{arg}}

**Goal**: Compress while preserving **perfect execution equivalence** (score = 1.0).

**Compression Target**: ~50% reduction is ideal, but lesser compression is acceptable. Perfect equivalence (1.0) is mandatory; compression amount is secondary.

---

## What is Execution Equivalence?

**Execution Equivalence** means: A reader following the compressed version will achieve the same results as someone following the original.

**Preserve**:
- **YAML frontmatter** (between `---` delimiters) - REQUIRED for slash commands
- **Decision-affecting information**: Claims, requirements, constraints that affect what to do
- **Relationship structure**: Temporal ordering (A before B), conditionals (IF-THEN), prerequisites, exclusions (A ⊥ B), escalations
- **Control flow**: Explicit sequences, blocking checkpoints (STOP, WAIT), branching logic
- **Executable details**: Commands, file paths, thresholds, specific values

**Safe to remove**:
- **Redundancy**: Repeated explanations of same concept
- **Verbose explanations**: Long-winded descriptions that can be condensed
- **Meta-commentary**: Explanatory comments about the document (NOT structural metadata like YAML frontmatter)
- **Non-essential examples**: Examples that don't add new information
- **Elaboration**: Extended justifications or background that don't affect decisions

---

## Compression Approach

**Focus on relationships**:
- Keep explicit relationship statements (Prerequisites, Dependencies, Exclusions, Escalations)
- Preserve temporal ordering (Step A→B)
- Maintain conditional logic (IF-THEN-ELSE)
- Keep constraint declarations (CANNOT coexist, MUST occur after)

**Condense explanations**:
- Remove "Why This Ordering Matters" verbose sections → keep ordering statement
- Remove "Definition" sections that explain obvious terms
- Combine related claims into single statements where possible
- Use high-level principle statements instead of exhaustive enumeration (when appropriate)

---

## Output

Read `{{arg}}`, compress it, and **USE THE WRITE TOOL** to save the compressed version.

**⚠️ CRITICAL**: You MUST actually write the file using the Write tool. Do NOT just describe
or summarize the compressed content - physically create the file.

**Target**: ~50% word reduction while maintaining execution equivalence.

Execute compression:

# Invoke agent
Task tool: general-purpose agent with above prompt

Step 4: Validate with /compare-docs

⚠️ CRITICAL: Before saving compressed version, read and save the ORIGINAL document state to use as baseline for validation.

After agent completes:

  1. Save original document (ONLY if baseline doesn't exist):

    BASELINE="/tmp/original-{{filename}}"
    if [ ! -f "$BASELINE" ]; then
      cp {{arg}} "$BASELINE"
      echo "✅ Saved baseline: $BASELINE ($(wc -l < "$BASELINE") lines)"
    else
      echo "✅ Reusing existing baseline: $BASELINE"
    fi

    Why baseline is preserved: Baseline is kept until user explicitly confirms they're done iterating (see Step 5). This ensures scores always compare against the TRUE original, not intermediate compressed versions.

  2. Determine version number and save compressed version:

    VERSION_FILE="/tmp/shrink-doc-{{filename}}-version.txt"
    
    # Get next version number from persistent counter (survives across sessions)
    if [ -f "$VERSION_FILE" ]; then
      LAST_VERSION=$(cat "$VERSION_FILE")
      VERSION=$((LAST_VERSION + 1))
    else
      # First time: check for existing version files to continue numbering
      HIGHEST=$(ls /tmp/compressed-{{filename}}-v*.md 2>/dev/null | sed 's/.*-v\([0-9]*\)\.md/\1/' | sort -n | tail -1)
      if [ -n "$HIGHEST" ]; then
        VERSION=$((HIGHEST + 1))
      else
        VERSION=1
      fi
    fi
    
    # Save version counter for next iteration
    echo "$VERSION" > "$VERSION_FILE"
    
    # Save with version number for rollback capability
    # Agent output → /tmp/compressed-{{filename}}-v${VERSION}.md
    echo "📝 Saved as version ${VERSION}: /tmp/compressed-{{filename}}-v${VERSION}.md"

    Why persistent versioning: Version numbers continue across sessions (v1, v2 in session 1 → v3, v4 in session 2) so older revisions are never overwritten. This enables rollback to any previous version and maintains complete compression history.

  3. Verify YAML frontmatter preserved (if compressing slash command):

    head -5 /tmp/compressed-{{filename}}-v${VERSION}.md | grep -q "^---$" || echo "⚠️ WARNING: YAML frontmatter missing!"
  4. Run validation AGAINST ORIGINAL:

    # ALWAYS compare against original baseline, NOT current file state
    /compare-docs /tmp/original-{{filename}} /tmp/compressed-{{filename}}-v${VERSION}.md

    ⚠️ IMPORTANT: The validation score reflects execution equivalence between:

    • Document A: Original document state BEFORE /shrink-doc was invoked in this session
    • Document B: Newly compressed candidate version

    NOT a comparison against any intermediate compressed versions.

  5. Parse validation result:

    • Extract execution_equivalence_score
    • Extract warnings and lost_relationships
    • Extract structural_changes

Scoring Context: When reporting the score to the user, explicitly state:

Score {score}/1.0 compares the compressed version against the ORIGINAL document
state from before /shrink-doc was invoked (not against any intermediate versions).

⚠️ CRITICAL REMINDER: On second, third, etc. invocations:

  • REUSE /tmp/original-{{filename}} from first invocation
  • DO NOT create /tmp/original-{{filename}}-v2.md or similar
  • DO NOT compare against intermediate compressed versions
  • The baseline is set ONCE on first invocation and REUSED for all subsequent invocations

Step 5: Decision Logic

Threshold: 1.0

Report Format (for approval):

  1. What was preserved
  2. What was removed
  3. Validation Details (claim/relationship/graph scores)
  4. Results (original size, compressed size, reduction %, execution equivalence score)
  5. Version Comparison Table (showing all versions generated in this session)

⚠️ CRITICAL: List execution equivalence score at bottom for easy visibility.

Version Comparison Table Format:

After presenting validation results for ANY version, show comparison table:

| Version | Lines | Size | Reduction | Score | Status |
|---------|-------|------|-----------|-------|--------|
| **Original** | {lines} | {size} | baseline | N/A | Reference |
| **V1** | {lines} | {size} | {%} | {score} | {✅/❌/✓applied} |
| **V2** | {lines} | {size} | {%} | {score} | {✅/❌/✓applied} |
| **V3** | {lines} | {size} | {%} | {score} | {✅/❌/✓applied} |

Status Legend:

  • ✅ = Approved (score = 1.0)
  • ❌ = Rejected (score < 1.0)
  • ✓ applied = Currently applied to original file

Example:

| Version | Lines | Size | Reduction | Score | Status |
|---------|-------|------|-----------|-------|--------|
| **Original** | 1,057 | 48K | baseline | N/A | Reference |
| **V1** | 520 | 26K | 51% | 0.89 | ❌ rejected |
| **V2** | 437 | 27K | 59% | 0.97 | ✓ applied |

If score = 1.0: ✅ APPROVE

Validation passed! Execution equivalence: {score}/1.0

✅ Approved version: /tmp/compressed-{{filename}}-v${VERSION}.md

Writing compressed version to {{arg}}...

→ Overwrite original with approved version → Clean up versioned compressions: rm /tmp/compressed-{{filename}}-v*.mdKEEP baseline: /tmp/original-{{filename}} preserved for potential future iterations

After applying changes, ASK user:

Changes applied successfully!

Would you like to try again to generate an even better version?
- YES → I'll keep the baseline and iterate with new compression targets
- NO → I'll clean up the baseline (compression complete)

If user says YES (wants to try again): → Keep /tmp/original-{{filename}} → Future /shrink-doc invocations will reuse this baseline → Scores will reflect cumulative compression from true original → Go back to Step 3 with user's feedback

If user says NO (done iterating): → rm /tmp/original-{{filename}}rm /tmp/shrink-doc-{{filename}}-version.txt → Note: Future /shrink-doc on this file will use compressed version as new baseline

If score < 1.0: ❌ ITERATE

Validation requires improvement. Score: {score}/1.0 (threshold: 1.0)

Components:
- Claim preservation: {claim_score}
- Relationship preservation: {relationship_score}
- Graph structure: {graph_score}

**Why < 1.0 requires iteration**:
Scores below 1.0 indicate relationship abstraction or loss that creates
interpretation vulnerabilities. See /compare-docs § Score Interpretation
for detailed vulnerability analysis.

**Common issues at this score range**:
- Abstraction ambiguity (e.g., "ALL of X" → separate statements)
- Lost mutual exclusivity constraints
- Conditional logic flattening (IF-THEN-ELSE → flat list)
- Temporal dependencies implicit rather than explicit

Issues found:
{list warnings from /compare-docs}

Specific relationship losses:
{list lost_relationships with details}

Re-invoking agent with feedback to fix issues...

→ Go to Step 6 (Iteration)

⚠️ CRITICAL: Verify Decision Logic Before Presenting

Before presenting results to user, MANDATORY self-check:

# Self-validation checklist
if [ score == 1.0 ]; then
  decision="APPROVE"
elif [ score < 1.0 ]; then
  decision="ITERATE"
fi

# Verify no contradictions
if [ stated_decision != expected_decision ]; then
  ERROR: "Decision logic error detected"
  FIX: "Recalculate thresholds"
fi

Common Mistakes: ❌ WRONG: "Score 0.97, close enough to 1.0" (0.97 < 1.0, must be perfect) ✅ CORRECT: "Score 0.97 < 1.0, iterate to achieve perfect equivalence"

WRONG: "Score 0.99, good enough" (ignores 1.0 threshold) ✅ CORRECT: "Score 0.99 < 1.0, iterate to eliminate any loss"

Prevention: Always verify threshold comparison matches stated score value before presenting.


Step 6: Iteration Loop

If score < 1.0, invoke agent again with specific feedback:

Iteration Prompt Template:

**Document Compression - Revision Attempt {iteration_number}**

**Previous Score**: {score}/1.0 (threshold: 1.0)

**Issues Identified by Validation**:

{warnings from /compare-docs}

**Lost Relationships**:

{for each lost_relationship:}
- **{type}**: {from_claim} → {to_claim}
  - Constraint: {constraint}
  - Evidence: {evidence}
  - Impact: {violation_consequence}
  - **Fix**: {specific recommendation}

**Your Task**:

Revise the compressed document to restore the lost relationships while maintaining compression.

**Original**: /tmp/original-{{filename}}
**Previous Attempt**: /tmp/compressed-{{filename}}-v${VERSION}.md

Focus on:
1. Restoring explicit relationship statements identified above
2. Maintaining conditional structure (IF-THEN-ELSE)
3. Preserving mutual exclusivity constraints
4. Keeping escalation/fallback paths

**⚠️ CRITICAL**: USE THE WRITE TOOL to save the revised document to the specified path.
Do NOT just describe or return the content - you MUST physically write the file.

After iteration:

  • Save revised version as next version number (v${VERSION+1})
  • Re-run /compare-docs validation AGAINST ORIGINAL BASELINE
  • Apply decision logic again (Step 5)

🚨 MANDATORY: /compare-docs Required for EVERY Iteration

CRITICAL: You MUST invoke /compare-docs (SlashCommand tool) for EVERY version validation. There are NO exceptions. Manual validation, estimation, or checklist-based scoring is PROHIBITED.

Why This Is Non-Negotiable:

  • Session 7937e222: Agent manually validated v2/v3 with self-created checklist → score 0.97
  • Independent re-analysis: Actual score was 0.72 (25% inflation)
  • Result: Compression approved that lost critical content

Validation Anti-Patterns (ALL are violations):

WRONG #1: Manual checklist validation

"Let me assess v2 improvements..."
| Category | Original | v2 | Preserved |
| State machine | ✅ | ✅ | 100% |
[creates own checklist, assigns 100% to all]
"Estimated Score: 0.97"

Why wrong: Agent knows what SHOULD be there, confirms it exists (confirmation bias)

WRONG #2: Estimation without /compare-docs

"Good progress on v2. Estimated Score: ~0.88"

Why wrong: No independent extraction, just subjective assessment

WRONG #3: Custom Task prompt with items to verify

Task: "Verify these 6 improvements are present: 1. X, 2. Y..."

Why wrong: Primes validator to confirm checklist, misses other losses

CORRECT: Invoke /compare-docs for EVERY version

# v1 validation
/compare-docs /tmp/original-{filename} /tmp/compressed-{filename}-v1.md

# v2 validation (after iteration)
/compare-docs /tmp/original-{filename} /tmp/compressed-{filename}-v2.md

# v3 validation (after iteration)
/compare-docs /tmp/original-{filename} /tmp/compressed-{filename}-v3.md

Enforcement: Score is ONLY valid if it comes from /compare-docs output. Any score derived from manual assessment, estimation, or targeted validation is INVALID.

Self-Check Before Reporting Score:

  1. Did I invoke /compare-docs (SlashCommand tool) for this version? YES/NO
  2. Is the score from /compare-docs output, not my own calculation? YES/NO
  3. If either is NO → STOP and invoke /compare-docs

Maximum iterations: 3

  • If still < 1.0 after 3 attempts, report to user and ask for guidance
  • All versions preserved in /tmp for rollback
  • User may choose to accept best attempt or abandon compression

Implementation Notes

Agent Type: MUST use subagent_type: "general-purpose"

Validation Tool: Use /compare-docs (SlashCommand tool)

Validation Baseline: On first invocation, save original document to /tmp/original-{filename} and use this as baseline for ALL subsequent validation comparisons in the session.

Versioning Scheme: Each compression attempt is saved with incrementing version numbers for rollback capability.

File Operations:

  • Read original: Read tool
  • Save original baseline: Write tool to /tmp/original-{filename} (once per session)
  • Save versioned compressed: Write tool to /tmp/compressed-{filename}-v1.md, /tmp/compressed-{filename}-v2.md, etc.
  • Overwrite original: Write tool to {{arg}} (only after approval)
  • Cleanup after approval: rm /tmp/compressed-{filename}-v*.md /tmp/original-{filename}

Rollback Capability:

  • If latest version unsatisfactory, previous versions available at /tmp/compressed-{filename}-v{N}.md
  • Example: If v3 approved but later found problematic, can review v1 or v2
  • Versions automatically cleaned up after successful approval

Iteration State:

  • Track iteration count via version numbers
  • Provide specific feedback from validation warnings
  • ALWAYS validate against original baseline, not previous iteration

Success Criteria

Compression approved when:

  • Execution equivalence score = 1.0

Compression quality metrics:

  • Word reduction: ~50% (target)
  • Execution equivalence: = 1.0
  • Claim preservation: = 1.0
  • Relationship preservation: = 1.0
  • Graph structure: = 1.0
  • No critical relationship losses

Edge Cases

Abstraction vs Enumeration: When compressed document uses high-level constraint statements (e.g., "handlers are mutually exclusive") instead of explicit pairwise enumerations, validation may score 0.85-0.94. System will automatically iterate to restore explicit relationships, as abstraction creates interpretation vulnerabilities (see /compare-docs § Score Interpretation).

Score Plateau: If multiple iterations needed but score plateaus (no improvement after 2 attempts, e.g., v1=0.87, v2=0.88, v3=0.89), compression may be hitting fundamental limits. After 3 attempts below 1.0, report best version to user and explain compression challenges encountered.

Multiple Iterations: Each iteration should show improvement. Monitor progression toward 1.0 threshold.

Large Documents: For documents >10KB, consider breaking into logical sections and compressing separately to improve iteration efficiency.


Example Usage

/shrink-doc /workspace/main/.claude/commands/example-command.md

Expected flow:

  1. Validate document type ✅
  2. Save original to /tmp/original-example-command.md (baseline) ✅
  3. Invoke compression agent
  4. Save to /tmp/compressed-example-command-v1.md (version 1) ✅
  5. Run /compare-docs /tmp/original-example-command.md /tmp/compressed-example-command-v1.md
  6. Score 1.0 → Approve v1 and overwrite original ✅
  7. Cleanup: Remove /tmp/compressed-example-command-v*.md and /tmp/original-example-command.md ✅

If iteration needed:

  • v1 score < 1.0 → Save v2, validate against original
  • v2 score < 1.0 → Save v3, validate against original
  • v3 score = 1.0 → Approve v3, cleanup v1/v2/v3 and original
  • v3 score < 1.0 (after max iterations) → Report to user with best version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment