A core part of Bitcoin’s pseudonymity is making sure on-chain surveillance can’t infer anything about transaction participants beyond what’s already visible on chain. Wallet fingerprints violate this by leaking additional identifying details. Once an observer knows which wallet created a transaction, they can combine that with other heuristics such as, change output patterns or common input ownership to cluster addresses and glean other details. These heuristics weaken the privacy guarantees of Bitcoin and undermine existing privacy protocols.
Researchers have already identified multiple wallet fingerprints in existing wallets (0, 1). But discovering these fingerprints manually is laborious and requires deep familiarity with the wallet’s internal logic.
Our project automates the detection and storage of wallet fingerprints using a Retrieval-Augmented Generation (RAG) data pipeline.
At a high level:
- We pull down a specific version of an open source wallet and filter out non-functional code. We use tree-sitter to parse the code into structured chunks.
- We then embed these chunks using OpenAI’s embedding model. Each embedding is large a vector that captures the semantic meaning of the code. Effectively representing what the code does.
- These embeddings are stored in a vector database optimized for similarity search. This database becomes a semantic index for that wallet at that version.
Although our focus is fingerprint extraction, this indexed codebase can also serve researchers and developers exploring wallet internals or auditing behavior. So far, we’ve created embeddings for:
- Electrum @ 4.5.8
- Bitcoin Core Wallet @ 0.27
- Drongo (used by Sparrow Wallet) @ 2.2.2
To identify specific fingerprints, we query the vector database with natural language descriptions of behaviors (e.g., “transaction version, utxo selection, etc...") and retrieve the most relevant code chunks. We then prompt an LLM (GPT-4o) to reason about whether the described fingerprint exists in that code. If it does, we store the fingerprint along side the wallet embeddings.
Once we have a set of fingerprints, detecting which wallet created a transaction becomes a matter of matching against a decision tree. We’ve ported Ishanaa’s original wallet fingerprinting tool to Rust and plan to maintain it for our own projects and the wider research community.
- Finish porting Ishanaa's original python code
- Detect fingerprints using results from our RAG pipeline
- Add Python bindings to support the research community
- Expand fingerprint support: ephemeral anchors, time-locked and/or unique spending conditions
- (Long-term) Explore temporal fingerprints like RBF behavior, CPFP packages, and fee-rate patterns described in 1
- Publish the library on crates.io
- Embed more wallets and wallet libraries!
- Improve fingerprint detection using Agentic RAG and similar methods
- Open source the vector database once we’ve validated the results