Skip to content

Instantly share code, notes, and snippets.

@yongkangc
Created September 2, 2025 08:56
Show Gist options
  • Select an option

  • Save yongkangc/44510acad3b872a176c0b82ecf215738 to your computer and use it in GitHub Desktop.

Select an option

Save yongkangc/44510acad3b872a176c0b82ecf215738 to your computer and use it in GitHub Desktop.
Ethereum State Root Debugging: Nibble-by-Nibble Trie Walking Guide

Ethereum State Root Debugging: Nibble-by-Nibble Trie Walking Guide

A comprehensive guide for debugging state root mismatches in Ethereum clients by systematically walking through tries to identify the exact source of discrepancies.

Table of Contents

Overview

State root mismatches in Ethereum clients can stem from two primary sources:

  1. EVM Execution Bugs - Wrong transaction processing leading to incorrect bundle state
  2. State Root Computation Bugs - Correct state data but wrong trie construction

This guide provides a systematic approach to identify which category your bug falls into and pinpoint the exact location of the discrepancy.

Problem Analysis Framework

Root Cause Categories

graph TD
    SR[State Root Mismatch<br/>Expected: 0x9859bd0e...<br/>Actual: 0xba7c973f...]
    
    SR --> EC[EVM Execution Issue]
    SR --> TC[Trie Computation Issue]
    
    EC --> TXB[Transaction Processing Bug]
    EC --> GB[Gas Calculation Bug]
    EC --> EB[EIP Implementation Bug]
    EC --> PB[Precompile Bug]
    
    TC --> PLB[PrefixSetLoader Bug]
    TC --> HBB[HashBuilder Bug]
    TC --> DBB[Database Bug]
    TC --> CSB[Changeset Bug]
    
    TXB --> BS[Wrong Bundle State]
    GB --> BS
    EB --> BS
    PB --> BS
    
    PLB --> WTS[Wrong Trie Structure]
    HBB --> WTS
    DBB --> WTS
    CSB --> WTS
    
    BS --> WC[Wrong Changesets in DB]
    WC --> WTS
    
    classDef problem fill:#ffcccb
    classDef execution fill:#ffd700
    classDef trie fill:#87ceeb
    classDef result fill:#98fb98
    
    class SR problem
    class EC,TXB,GB,EB,PB execution
    class TC,PLB,HBB,DBB,CSB trie
    class BS,WTS,WC result
Loading

Decision Tree

flowchart TD
    START[State Root Mismatch Detected]
    
    START --> STEP1[Step 1: Get Block Data]
    STEP1 --> STEP2[Step 2: Extract All Addresses]
    STEP2 --> STEP3[Step 3: Generate Account Proofs]
    STEP3 --> CHECK[Compare Account Proofs]
    
    CHECK --> MATCH{All Accounts<br/>Match?}
    MATCH -->|Yes| MYSTERY[Mystery: Same accounts<br/>different state root<br/>Likely HashBuilder bug]
    MATCH -->|No| FOUND[Found Mismatched Account]
    
    FOUND --> ANALYZE[Step 4: Analyze Account Fields]
    ANALYZE --> FIELD_CHECK{Which Field<br/>Differs?}
    
    FIELD_CHECK -->|Balance/Nonce| EVM_BUG[EVM Execution Bug<br/>Wrong transaction processing]
    FIELD_CHECK -->|CodeHash| CODE_BUG[Code deployment bug<br/>Wrong contract creation]
    FIELD_CHECK -->|StorageRoot| STORAGE_BUG[Storage trie issue<br/>Walk storage trie]
    
    STORAGE_BUG --> STEP5[Step 5: Walk Storage Trie]
    STEP5 --> SLOT_CHECK{Storage Slot<br/>Values Differ?}
    
    SLOT_CHECK -->|Yes| EVM_STORAGE[EVM Storage Bug<br/>Wrong SSTORE execution]
    SLOT_CHECK -->|No| TRIE_STORAGE[Storage Trie Bug<br/>Correct values, wrong root]
    
    EVM_BUG --> STEP6[Step 6: Trace Transaction]
    CODE_BUG --> STEP6
    EVM_STORAGE --> STEP6
    TRIE_STORAGE --> STEP7[Step 7: Debug Trie Construction]
    MYSTERY --> STEP7
    
    classDef start fill:#ff9999
    classDef step fill:#99ccff
    classDef decision fill:#ffff99
    classDef result fill:#99ff99
    
    class START start
    class STEP1,STEP2,STEP3,ANALYZE,STEP5,STEP6,STEP7 step
    class MATCH,FIELD_CHECK,SLOT_CHECK decision
    class EVM_BUG,CODE_BUG,STORAGE_BUG,EVM_STORAGE,TRIE_STORAGE,MYSTERY result
Loading

Debugging Workflow

Phase 1: Block Data Collection

# Get complete block data
BLOCK_NUM=23272425
cast block $BLOCK_NUM --full --json > block_${BLOCK_NUM}.json

# Extract state roots from different clients
RETH_ROOT=$(cast block $BLOCK_NUM --rpc-url http://localhost:8546 | jq -r '.stateRoot')
GETH_ROOT=$(cast block $BLOCK_NUM --rpc-url http://localhost:8545 | jq -r '.stateRoot')

echo "Expected (Geth): $GETH_ROOT"
echo "Actual (Reth):   $RETH_ROOT"

Phase 2: Address Enumeration

# Extract all addresses touched in this block
jq -r '.transactions[] | [.from, .to] | .[]' block_${BLOCK_NUM}.json | grep -v null | sort -u > addresses.txt

# Also get addresses from logs (contracts emitting events)  
jq -r '.transactions[] | .logs[]?.address // empty' block_${BLOCK_NUM}.json | sort -u >> addresses.txt

# Get unique addresses
sort -u addresses.txt > unique_addresses.txt

echo "Found $(wc -l < unique_addresses.txt) unique addresses to check"

Phase 3: Account State Comparison

# Compare account states between clients
mkdir -p proofs/{reth,geth}

while read address; do
    echo "Checking address: $address"
    
    # Get proofs from both clients
    cast proof $address --block $BLOCK_NUM --rpc-url http://localhost:8546 > proofs/reth/${address}.json
    cast proof $address --block $BLOCK_NUM --rpc-url http://localhost:8545 > proofs/geth/${address}.json
    
    # Quick comparison
    if ! cmp -s proofs/reth/${address}.json proofs/geth/${address}.json; then
        echo "MISMATCH: $address"
        echo $address >> mismatched_addresses.txt
    fi
done < unique_addresses.txt

Phase 4: Deep Analysis of Mismatched Accounts

# Analyze each mismatched account
while read address; do
    echo "=== Analyzing $address ==="
    
    # Extract account fields
    RETH_BALANCE=$(jq -r '.balance' proofs/reth/${address}.json)
    GETH_BALANCE=$(jq -r '.balance' proofs/geth/${address}.json)
    RETH_NONCE=$(jq -r '.nonce' proofs/reth/${address}.json)
    GETH_NONCE=$(jq -r '.nonce' proofs/geth/${address}.json)
    RETH_STORAGE_ROOT=$(jq -r '.storageHash' proofs/reth/${address}.json)
    GETH_STORAGE_ROOT=$(jq -r '.storageHash' proofs/geth/${address}.json)
    RETH_CODE_HASH=$(jq -r '.codeHash' proofs/reth/${address}.json)
    GETH_CODE_HASH=$(jq -r '.codeHash' proofs/geth/${address}.json)
    
    # Compare fields
    if [ "$RETH_BALANCE" != "$GETH_BALANCE" ]; then
        echo "BALANCE MISMATCH: Reth=$RETH_BALANCE, Geth=$GETH_BALANCE"
        echo "→ EVM Execution Bug: Transaction processing error"
    fi
    
    if [ "$RETH_NONCE" != "$GETH_NONCE" ]; then
        echo "NONCE MISMATCH: Reth=$RETH_NONCE, Geth=$GETH_NONCE"  
        echo "→ EVM Execution Bug: Transaction count error"
    fi
    
    if [ "$RETH_STORAGE_ROOT" != "$GETH_STORAGE_ROOT" ]; then
        echo "STORAGE ROOT MISMATCH: Reth=$RETH_STORAGE_ROOT, Geth=$GETH_STORAGE_ROOT"
        echo "→ Need to walk storage trie for $address"
        echo $address >> storage_mismatches.txt
    fi
    
    if [ "$RETH_CODE_HASH" != "$GETH_CODE_HASH" ]; then
        echo "CODE HASH MISMATCH: Reth=$RETH_CODE_HASH, Geth=$GETH_CODE_HASH"
        echo "→ Contract deployment bug"
    fi
    
    echo ""
done < mismatched_addresses.txt

Trie Walking Logic

Account Trie Walking

The account trie maps keccak256(address) to account data. Here's how to walk it:

graph TB
    subgraph "Account Trie Walking Process"
        ADDR[Address: 0x742d35Cc...]
        HASH[Keccak Hash: 0x8f4b7c...]
        NIBBLES[Nibbles: [8,f,4,b,7,c,...]]
        
        ADDR --> HASH
        HASH --> NIBBLES
        
        NIBBLES --> LEVEL0[Level 0: Root Node<br/>Check child[8]]
        LEVEL0 --> LEVEL1[Level 1: Branch Node<br/>Check child[f]]
        LEVEL1 --> LEVEL2[Level 2: Extension Node<br/>Match key prefix 4b7c]
        LEVEL2 --> LEVEL3[Level 3: Leaf Node<br/>Contains account data]
        
        LEVEL3 --> ACCOUNT[Account Data:<br/>balance, nonce,<br/>storage_root, code_hash]
    end
    
    subgraph "Proof Comparison"
        RETH_PROOF[Reth Account Proof<br/>[node0, node1, node2, node3]]
        GETH_PROOF[Geth Account Proof<br/>[node0, node1, node2, node3]]
        
        RETH_PROOF --> COMPARE{Compare<br/>Node by Node}
        GETH_PROOF --> COMPARE
        
        COMPARE -->|Match| NEXT_LEVEL[Continue to Next Level]
        COMPARE -->|Mismatch| FOUND_ISSUE[Found Trie Structure Issue<br/>at This Level]
        
        NEXT_LEVEL --> COMPARE
    end
    
    classDef process fill:#e1f5fe
    classDef comparison fill:#f3e5f5
    classDef issue fill:#ffebee
    
    class ADDR,HASH,NIBBLES,LEVEL0,LEVEL1,LEVEL2,LEVEL3,ACCOUNT process
    class RETH_PROOF,GETH_PROOF,COMPARE,NEXT_LEVEL comparison
    class FOUND_ISSUE issue
Loading

Storage Trie Walking

For accounts with storage root mismatches, walk the storage trie:

graph TB
    subgraph "Storage Trie Walking"
        SLOT[Storage Slot: 0x0]
        SLOT_HASH[Keccak Hash: 0x290d...]
        SLOT_NIBBLES[Nibbles: [2,9,0,d,...]]
        
        SLOT --> SLOT_HASH
        SLOT_HASH --> SLOT_NIBBLES
        
        SLOT_NIBBLES --> S_LEVEL0[Level 0: Storage Root<br/>Check child[2]]
        S_LEVEL0 --> S_LEVEL1[Level 1: Branch Node<br/>Check child[9]]
        S_LEVEL1 --> S_LEVEL2[Level 2: Extension Node<br/>Match key prefix 0d...]
        S_LEVEL2 --> S_LEVEL3[Level 3: Leaf Node<br/>Contains slot value]
        
        S_LEVEL3 --> VALUE[Storage Value:<br/>0x123456...]
    end
    
    subgraph "Storage Proof Analysis"
        RETH_STORAGE[Reth Storage Proof]
        GETH_STORAGE[Geth Storage Proof]
        
        RETH_STORAGE --> VALUE_COMP{Compare<br/>Storage Values}
        GETH_STORAGE --> VALUE_COMP
        
        VALUE_COMP -->|Different Values| EVM_STORAGE_BUG[EVM Storage Bug<br/>Wrong SSTORE execution]
        VALUE_COMP -->|Same Values| STORAGE_TRIE_BUG[Storage Trie Bug<br/>Correct values, wrong root]
    end
    
    classDef storage fill:#e8f5e8
    classDef analysis fill:#fff3e0
    classDef bug fill:#ffebee
    
    class SLOT,SLOT_HASH,SLOT_NIBBLES,S_LEVEL0,S_LEVEL1,S_LEVEL2,S_LEVEL3,VALUE storage
    class RETH_STORAGE,GETH_STORAGE,VALUE_COMP analysis
    class EVM_STORAGE_BUG,STORAGE_TRIE_BUG bug
Loading

Implementation Guide

Python Script for Automated Trie Walking

#!/usr/bin/env python3
"""
Ethereum State Root Debugger
Walks tries nibble-by-nibble to find exact mismatch locations
"""

import json
import requests
from eth_hash.auto import keccak
from typing import Dict, List, Optional, Tuple

class TrieWalker:
    def __init__(self, reth_rpc: str, geth_rpc: str):
        self.reth_rpc = reth_rpc
        self.geth_rpc = geth_rpc
    
    def rpc_call(self, rpc_url: str, method: str, params: List) -> Dict:
        """Make RPC call to Ethereum client"""
        payload = {
            "jsonrpc": "2.0",
            "method": method,
            "params": params,
            "id": 1
        }
        response = requests.post(rpc_url, json=payload)
        return response.json()["result"]
    
    def get_account_proof(self, rpc_url: str, address: str, block: int) -> Dict:
        """Get account proof from client"""
        return self.rpc_call(rpc_url, "eth_getProof", [address, [], hex(block)])
    
    def get_storage_proof(self, rpc_url: str, address: str, slot: str, block: int) -> Dict:
        """Get storage proof from client"""
        return self.rpc_call(rpc_url, "eth_getProof", [address, [slot], hex(block)])
    
    def address_to_nibbles(self, address: str) -> List[str]:
        """Convert address to nibbles for trie walking"""
        # Remove 0x prefix and hash the address
        addr_bytes = bytes.fromhex(address[2:])
        hashed = keccak(addr_bytes).hex()
        # Convert to nibbles (each hex char is a nibble)
        return list(hashed)
    
    def walk_account_trie(self, address: str, block: int) -> Tuple[bool, Optional[int], Dict]:
        """
        Walk account trie for given address
        Returns: (match, mismatch_level, details)
        """
        print(f"Walking account trie for {address}")
        
        # Get proofs from both clients
        reth_proof = self.get_account_proof(self.reth_rpc, address, block)
        geth_proof = self.get_account_proof(self.geth_rpc, address, block)
        
        # Convert address to nibbles for path analysis
        nibbles = self.address_to_nibbles(address)
        print(f"Address nibbles: {nibbles[:16]}... (truncated)")
        
        # Compare account proof nodes level by level
        reth_nodes = reth_proof["accountProof"]
        geth_nodes = geth_proof["accountProof"]
        
        if len(reth_nodes) != len(geth_nodes):
            print(f"Proof length mismatch: Reth={len(reth_nodes)}, Geth={len(geth_nodes)}")
            return False, 0, {"error": "proof_length_mismatch"}
        
        # Walk through proof nodes
        for level, (reth_node, geth_node) in enumerate(zip(reth_nodes, geth_nodes)):
            if reth_node != geth_node:
                print(f"TRIE NODE MISMATCH at level {level}")
                print(f"Nibble path: {nibbles[:level+1]}")
                print(f"Reth node: {reth_node}")
                print(f"Geth node: {geth_node}")
                return False, level, {
                    "nibble_path": nibbles[:level+1],
                    "reth_node": reth_node,
                    "geth_node": geth_node
                }
        
        # Compare final account data
        account_fields = ["balance", "nonce", "storageHash", "codeHash"]
        mismatches = {}
        
        for field in account_fields:
            if reth_proof[field] != geth_proof[field]:
                mismatches[field] = {
                    "reth": reth_proof[field],
                    "geth": geth_proof[field]
                }
        
        if mismatches:
            print("Account data mismatches:")
            for field, values in mismatches.items():
                print(f"  {field}: Reth={values['reth']}, Geth={values['geth']}")
            
            return False, -1, {"account_mismatches": mismatches}
        
        print("Account trie matches perfectly")
        return True, None, {}
    
    def walk_storage_trie(self, address: str, block: int) -> Dict:
        """
        Walk storage trie for account with storage root mismatch
        """
        print(f"Walking storage trie for {address}")
        
        # First, get list of storage slots that changed in this block
        # This would typically come from transaction traces or execution witnesses
        # For now, we'll check common slots [0, 1, 2, ...]
        
        storage_mismatches = {}
        
        for slot_num in range(10):  # Check first 10 slots
            slot = hex(slot_num)
            
            try:
                reth_storage = self.get_storage_proof(self.reth_rpc, address, slot, block)
                geth_storage = self.get_storage_proof(self.geth_rpc, address, slot, block)
                
                # Check if storage values differ
                reth_value = reth_storage["storageProof"][0]["value"]
                geth_value = geth_storage["storageProof"][0]["value"]
                
                if reth_value != geth_value:
                    print(f"STORAGE VALUE MISMATCH at slot {slot}")
                    print(f"  Reth: {reth_value}")
                    print(f"  Geth: {geth_value}")
                    
                    storage_mismatches[slot] = {
                        "reth_value": reth_value,
                        "geth_value": geth_value,
                        "reth_proof": reth_storage["storageProof"][0]["proof"],
                        "geth_proof": geth_storage["storageProof"][0]["proof"]
                    }
                
                # Compare storage proofs
                reth_proof_nodes = reth_storage["storageProof"][0]["proof"]
                geth_proof_nodes = geth_storage["storageProof"][0]["proof"]
                
                for level, (reth_node, geth_node) in enumerate(zip(reth_proof_nodes, geth_proof_nodes)):
                    if reth_node != geth_node:
                        print(f"STORAGE TRIE NODE MISMATCH at slot {slot}, level {level}")
                        storage_mismatches[slot] = storage_mismatches.get(slot, {})
                        storage_mismatches[slot]["trie_mismatch_level"] = level
                        break
                        
            except Exception as e:
                print(f"Error checking slot {slot}: {e}")
                continue
        
        return storage_mismatches
    
    def analyze_block(self, block_num: int, addresses: List[str]) -> Dict:
        """
        Analyze entire block for state root mismatches
        """
        print(f"Analyzing block {block_num}")
        
        results = {
            "block": block_num,
            "total_addresses": len(addresses),
            "account_mismatches": {},
            "storage_mismatches": {},
            "summary": {
                "evm_bugs": [],
                "trie_bugs": []
            }
        }
        
        for address in addresses:
            print(f"\n--- Checking {address} ---")
            
            # Walk account trie
            match, mismatch_level, details = self.walk_account_trie(address, block_num)
            
            if not match:
                results["account_mismatches"][address] = details
                
                # Categorize the bug type
                if "account_mismatches" in details:
                    # Account data differs - EVM bug
                    for field, values in details["account_mismatches"].items():
                        if field in ["balance", "nonce"]:
                            results["summary"]["evm_bugs"].append({
                                "type": f"account_{field}_mismatch",
                                "address": address,
                                "reth": values["reth"],
                                "geth": values["geth"]
                            })
                        elif field == "storageHash":
                            # Storage root differs - need deeper analysis
                            storage_results = self.walk_storage_trie(address, block_num)
                            results["storage_mismatches"][address] = storage_results
                            
                            if storage_results:
                                # Determine if it's EVM or trie bug based on storage values
                                for slot, slot_data in storage_results.items():
                                    if "reth_value" in slot_data and slot_data["reth_value"] != slot_data["geth_value"]:
                                        results["summary"]["evm_bugs"].append({
                                            "type": "storage_value_mismatch",
                                            "address": address,
                                            "slot": slot,
                                            "reth": slot_data["reth_value"],
                                            "geth": slot_data["geth_value"]
                                        })
                else:
                    # Trie structure differs - trie computation bug
                    results["summary"]["trie_bugs"].append({
                        "type": "account_trie_structure",
                        "address": address,
                        "mismatch_level": mismatch_level,
                        "details": details
                    })
        
        return results

# Usage example
def main():
    walker = TrieWalker(
        reth_rpc="http://localhost:8546",
        geth_rpc="http://localhost:8545"
    )
    
    # Addresses that need checking (from previous analysis)
    problem_addresses = [
        "0x742d35Cc6634C0532925a3b8D4034DfA7E69A99eA",
        # Add more addresses as needed
    ]
    
    results = walker.analyze_block(23272425, problem_addresses)
    
    # Print summary
    print("\n" + "="*50)
    print("ANALYSIS SUMMARY")
    print("="*50)
    
    if results["summary"]["evm_bugs"]:
        print("\nEVM EXECUTION BUGS FOUND:")
        for bug in results["summary"]["evm_bugs"]:
            print(f"  - {bug['type']} at {bug['address']}")
            if 'slot' in bug:
                print(f"    Slot {bug['slot']}: {bug['reth']} != {bug['geth']}")
            else:
                print(f"    {bug['reth']} != {bug['geth']}")
    
    if results["summary"]["trie_bugs"]:
        print("\nTRIE COMPUTATION BUGS FOUND:")
        for bug in results["summary"]["trie_bugs"]:
            print(f"  - {bug['type']} at {bug['address']}")
            print(f"    Mismatch at trie level {bug['mismatch_level']}")
    
    if not results["summary"]["evm_bugs"] and not results["summary"]["trie_bugs"]:
        print("\nNo mismatches found in analyzed addresses.")
        print("The state root difference might be due to:")
        print("  - Addresses not included in analysis")
        print("  - HashBuilder implementation differences")
        print("  - Database/changeset issues")
    
    # Save detailed results
    with open(f"trie_analysis_block_{results['block']}.json", "w") as f:
        json.dump(results, f, indent=2)
    
    print(f"\nDetailed results saved to trie_analysis_block_{results['block']}.json")

if __name__ == "__main__":
    main()

Shell Script Automation

#!/bin/bash
# Complete automated debugging workflow

BLOCK_NUM=23272425
RETH_RPC="http://localhost:8546"
GETH_RPC="http://localhost:8545"

echo "Starting comprehensive state root debugging for block $BLOCK_NUM"

# Phase 1: Setup
mkdir -p debug_output/{proofs,traces,analysis}
cd debug_output

# Phase 2: Get state roots
echo "Phase 1: Collecting state roots..."
RETH_ROOT=$(cast block $BLOCK_NUM --rpc-url $RETH_RPC | jq -r '.stateRoot')
GETH_ROOT=$(cast block $BLOCK_NUM --rpc-url $GETH_RPC | jq -r '.stateRoot')

echo "Expected: $GETH_ROOT"
echo "Actual:   $RETH_ROOT"

if [ "$RETH_ROOT" = "$GETH_ROOT" ]; then
    echo "State roots match! No debugging needed."
    exit 0
fi

# Phase 3: Collect addresses
echo "Phase 2: Extracting addresses from block..."
cast block $BLOCK_NUM --full --json | jq -r '
    .transactions[] | 
    [.from, .to] | 
    .[] | 
    select(. != null)
' | sort -u > addresses.txt

echo "Found $(wc -l < addresses.txt) addresses"

# Phase 4: Generate and compare proofs
echo "Phase 3: Generating account proofs..."
mkdir -p proofs/{reth,geth}

while read address; do
    echo "Checking $address..."
    
    # Get proofs
    cast proof $address --block $BLOCK_NUM --rpc-url $RETH_RPC > proofs/reth/${address}.json 2>/dev/null
    cast proof $address --block $BLOCK_NUM --rpc-url $GETH_RPC > proofs/geth/${address}.json 2>/dev/null
    
    # Compare
    if ! cmp -s proofs/reth/${address}.json proofs/geth/${address}.json; then
        echo "MISMATCH: $address"
        echo $address >> mismatched_addresses.txt
        
        # Quick field comparison
        RETH_BALANCE=$(jq -r '.balance // "0x0"' proofs/reth/${address}.json)
        GETH_BALANCE=$(jq -r '.balance // "0x0"' proofs/geth/${address}.json)
        
        if [ "$RETH_BALANCE" != "$GETH_BALANCE" ]; then
            echo "  Balance differs: $RETH_BALANCE vs $GETH_BALANCE"
            echo "$address,balance,$RETH_BALANCE,$GETH_BALANCE" >> evm_bugs.csv
        fi
    fi
done < addresses.txt

# Phase 5: Detailed analysis
if [ -f mismatched_addresses.txt ]; then
    echo "Phase 4: Running detailed trie analysis..."
    
    python3 trie_walker.py --block $BLOCK_NUM \
        --reth-rpc $RETH_RPC \
        --geth-rpc $GETH_RPC \
        --addresses-file mismatched_addresses.txt \
        --output analysis/detailed_results.json
else
    echo "No account mismatches found, but state root differs."
    echo "This suggests a HashBuilder or trie computation bug."
    echo "All accounts match but produce different state root."
fi

# Phase 6: Generate report
echo "Phase 5: Generating final report..."
{
    echo "# State Root Debugging Report - Block $BLOCK_NUM"
    echo "Generated: $(date)"
    echo ""
    echo "## State Root Comparison"
    echo "- Expected (Geth): \`$GETH_ROOT\`"
    echo "- Actual (Reth):   \`$RETH_ROOT\`"
    echo ""
    
    if [ -f mismatched_addresses.txt ]; then
        echo "## Mismatched Accounts ($(wc -l < mismatched_addresses.txt))"
        while read address; do
            echo "- \`$address\`"
        done < mismatched_addresses.txt
    fi
    
    if [ -f evm_bugs.csv ]; then
        echo ""
        echo "## EVM Execution Bugs Found"
        echo "\`\`\`csv"
        echo "Address,Field,Reth_Value,Geth_Value"
        cat evm_bugs.csv
        echo "\`\`\`"
    fi
    
    echo ""
    echo "## Next Steps"
    if [ -f evm_bugs.csv ]; then
        echo "1. Focus on EVM execution bugs - trace the problematic transactions"
        echo "2. Check transaction processing logic for affected accounts"  
        echo "3. Verify gas calculations and state transitions"
    else
        echo "1. No account-level differences found"
        echo "2. Investigate HashBuilder and trie construction logic"
        echo "3. Check PrefixSetLoader for changeset loading issues"
    fi
} > report.md

echo "Report generated: debug_output/report.md"
echo "Debugging complete!"

Advanced Techniques

RLP Decoding for Node Analysis

When you find mismatched trie nodes, decode them to understand the structure:

import rlp
from eth_hash.auto import keccak

def decode_trie_node(node_data: str) -> dict:
    """Decode RLP-encoded trie node"""
    try:
        node_bytes = bytes.fromhex(node_data[2:])  # Remove 0x
        decoded = rlp.decode(node_bytes)
        
        if len(decoded) == 2:
            # Leaf or Extension node
            first_byte = decoded[0][0] if decoded[0] else 0
            is_leaf = (first_byte & 0x20) != 0
            
            return {
                "type": "leaf" if is_leaf else "extension",
                "key": decoded[0].hex(),
                "value": decoded[1].hex() if is_leaf else None,
                "child": decoded[1].hex() if not is_leaf else None
            }
        elif len(decoded) == 17:
            # Branch node
            return {
                "type": "branch", 
                "children": [child.hex() if child else None for child in decoded[:16]],
                "value": decoded[16].hex() if decoded[16] else None
            }
        else:
            return {"type": "unknown", "raw": decoded}
            
    except Exception as e:
        return {"error": str(e), "raw": node_data}

Transaction Trace Analysis

For EVM bugs, analyze transaction execution:

# Get transaction that affected problematic account
PROBLEM_ADDR="0x742d35Cc6634C0532925a3b8D4034DfA7E69A99eA"

# Find transactions involving this address
cast block $BLOCK_NUM --full --json | jq -r "
    .transactions[] | 
    select(.from == \"$PROBLEM_ADDR\" or .to == \"$PROBLEM_ADDR\") |
    .hash
" | while read tx_hash; do
    echo "Tracing transaction: $tx_hash"
    
    # Get detailed trace
    cast run $tx_hash --rpc-url $RETH_RPC --trace > traces/reth_${tx_hash}.txt
    cast run $tx_hash --rpc-url $GETH_RPC --trace > traces/geth_${tx_hash}.txt
    
    # Compare traces
    if ! cmp -s traces/reth_${tx_hash}.txt traces/geth_${tx_hash}.txt; then
        echo "TRANSACTION EXECUTION DIFFERS: $tx_hash"
        diff traces/reth_${tx_hash}.txt traces/geth_${tx_hash}.txt > traces/diff_${tx_hash}.txt
    fi
done

Common Patterns

Pattern 1: Balance Mismatch (EVM Bug)

Symptoms: Account balance differs between clients
Root Cause: Transaction processing error (gas, transfers, fees)
Next Steps: Trace specific transactions affecting the account

Pattern 2: Storage Root Mismatch with Same Values (Trie Bug)

Symptoms: Storage values match but storage root differs
Root Cause: Storage trie construction bug
Next Steps: Debug HashBuilder for storage tries

Pattern 3: All Accounts Match but State Root Differs (HashBuilder Bug)

Symptoms: Every account proof matches but final state root differs
Root Cause: Account trie construction bug in HashBuilder
Next Steps: Debug account trie building process

Pattern 4: Nonce Mismatch (Transaction Ordering)

Symptoms: Account nonce differs
Root Cause: Transaction execution order or failed transaction handling
Next Steps: Check transaction pool and execution ordering

Troubleshooting

Common Issues

  1. RPC Timeout: Use shorter block ranges or increase timeout
  2. Memory Issues: Process addresses in batches
  3. Network Issues: Add retry logic for RPC calls
  4. Large State: Focus on changed accounts only

Performance Optimization

# Process in parallel
export -f check_address
parallel -j 4 check_address :::: addresses.txt

# Use local files instead of RPC where possible
reth db get-proof $ADDRESS $BLOCK_NUM > local_proof.json

Validation Checks

def validate_results(results: dict):
    """Validate debugging results for consistency"""
    
    # Check that all EVM bugs have different values
    for bug in results["summary"]["evm_bugs"]:
        assert bug["reth"] != bug["geth"], f"Bug {bug} has same values"
    
    # Check that trie bugs have same account data but different proofs  
    for bug in results["summary"]["trie_bugs"]:
        # Account data should be the same, proof structure different
        pass
    
    print("Results validation passed")

Conclusion

This systematic approach provides a complete framework for debugging state root mismatches:

  1. Collect comprehensive block data
  2. Compare account states systematically
  3. Walk tries nibble-by-nibble to find exact mismatch locations
  4. Categorize bugs as EVM execution vs. trie computation issues
  5. Provide targeted debugging approaches for each category

The key insight is that trie walking reveals exactly where data diverges, enabling precise identification of whether the bug is in transaction execution (wrong state changes) or state root computation (wrong trie construction).

By following this methodology, you can definitively determine whether your state root mismatch stems from EVM execution bugs or trie computation bugs, and focus your debugging efforts accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment