Skip to content

Instantly share code, notes, and snippets.

@JoseSK999
Last active October 2, 2025 18:33
Show Gist options
  • Select an option

  • Save JoseSK999/df0a2a014c7d9b626df1e2b19ccc7fb1 to your computer and use it in GitHub Desktop.

Select an option

Save JoseSK999/df0a2a014c7d9b626df1e2b19ccc7fb1 to your computer and use it in GitHub Desktop.
Witnessless Sync: Why Pruned Nodes Can Skip Witness Downloads for Assume-Valid Blocks

Witnessless Sync

Why pruned nodes can skip witness downloads for assume-valid blocks

Background

Two years ago I asked in BSE why (segregated) witness data is downloaded for assume-valid blocks in pruned mode. We don't validate these witnesses, and we delete them shortly after. Pieter Wuille explained that skipping witness downloads would require extending the set of consensus checks delegated to assume-valid. But implementing this change is relatively straightforward, and because SegWit witness data now makes up a significant share of block bytes, omitting it can cut bandwidth by ~34% over the full chain (and 48-54% in recent years).

What witnesses cost today

Updated: 2025-10-02 (snapshot at height 917,000).

We measured the share of witness bytes (size - strippedsize) across the full chain and recent windows. At this snapshot the full chain is ~690 GB, of which ~232 GB are witnesses—more than a third of all bytes (33.7%).

Range Witness Share
Last 100k blocks 54.17%
Last 200k blocks 53.22%
Last 300k blocks 48.06%
Full chain (height 917,000) 33.70%

More interestingly, over the last 4 years, witnesses account for more than half of all bytes (driven mainly by P2TR adoption and inscription traffic since early 2023). If this trend persists, when the chain reaches 1 TB the cumulative witness share will be ~40% (400 GB).

Extending assume-valid: Required Additional Checks

If I don't miss anything, the list of checks that assume-valid should now cover is:

  • Witness coinbase commitment (wtxid)
  • Witness size and sigops limits, including block weight
  • No witness data allowed in legacy inputs

This doesn't seem like a big deal compared to the full script validation we already outsource to assume-valid. If assume-valid covered these checks as well, then we could directly omit witness downloads for blocks within the assumed range.

An important thing to note is that all current consensus checks automatically pass even if we don't have the witnesses, because SegWit was a backwards-compatible soft fork that only constrained the rules. We don't even need to disable the wtxid commitment check, as it's optional when a block doesn't contain witnesses.

This idea led to a Bitcoin Core PR, with several conceptual ACKs but also some concerns about a security reduction and the loss of data availability. Below I will try to properly reason about this change and address any concerns.

Security Scenarios Analysis

First of all, let's see what are the bad scenarios that can, in theory, happen in both the current assume-valid mode and in the witness-skipping one.

Scenario A: Invalid-chain via assume-valid

The bad scenario in the current assume-valid mode is that developers, reviewers and maintainers lie to us and make us sync to an invalid chain. But we would need a majority of hashpower mining on that invalid chain as well.

Scenario B: Witness data unavailability

A new bad scenario that can happen if we skip witness downloads for assume-valid blocks is that we sync to a chain where some witnesses are not available (and so no kind of node can sync except for us). We have two options:

  1. A case where developers, reviewers and maintainers lied to us, because the witnesses were not available when they validated the block. This likely means we also need a majority of hashpower mining blocks that lack witnesses and building on top of them.
  2. A case where this lack of availability appeared after the assume-valid hash was audited as valid (they didn't lie at the time).

Trust Assumptions in assume-valid Mode

In Bitcoin Core, it is accepted that assume-valid does not change the trust assumptions of the software. In other words, users of Bitcoin Core can, without added risks, assume the scenario A won't happen. This is based on the fact that scripts are widely available for independent audit, and developers themselves are already trusted to not do bad things that reviewers and external auditors don't detect.

Now, it is easy to see that stating the validity of all scripts implies all scripts have been available to you (otherwise how do you even know they are valid if not available?). In fact, missing witnesses means the script evaluations fail, which assume-valid already covers.

Hence, the scenario B.1 is not a concern as it is an implied premise of assume-valid. In other words, we are already trusting developers, reviewers and maintainers to check witness data availability after the block was mined.

Thus, the only new concern is the scenario B.2, that is, the case where witness data availability is lost after the block was mined and validated by auditors. Losing data availability means we no longer have any reachable archive node (which by definition retains all witness data). But in that case we won't be able to sync, as we still require all legacy block data and all witnesses after the assume-valid range.

The new behavior only manifests when no archive nodes are available, but there are peers serving all legacy block data and all witnesses beyond the assume-valid block. In this scenario, our node will fully sync with the Bitcoin network, whereas other node types will stall during IBD. This is the only behavior that would change, and only under these circumstances.

Comparison with Pruned Node Behavior

We could think that in this situation our node shouldn't be able to sync, just like the rest of the nodes. But the outcome of syncing is just reaching the same state as the whole Bitcoin network, which in this scenario consists entirely of pruned nodes, and "archive nodes" with missing witnesses or pre-SegWit.

The synced pruned nodes will remain fully operational, unless they're told to reindex or switch to archive mode. These pruned nodes only checked data availability once, during their IBD, and they can run indefinitely without the need for re-downloading old data. If one-time availability checks are acceptable for regular pruned nodes, they're equally acceptable for a witness-skipping pruned node—both rely on the fact that those witnesses were available at an earlier point, verified locally or via the Open Source process.

Then, if we can skip the script evaluations because they are attested by developers, reviewers and maintainers, we can also skip the one-time availability check for witnesses, as it is implicitly verified by the same group of people.

If you think we must re-check witness availability during IBD because assume-valid only guarantees availability at a past moment, then by that logic pruned nodes would also have to rescan the entire chain periodically. But since regular pruned nodes don't re-check data availability, there's no reason our specific kind of pruned node should either.

And anyway, if this catastrophic availability failure really happens, then the root of the problem is the lack of archive nodes, not the proliferation of pruned ones. Moreover, asking for the witness data periodically only helps identify the problem, but this problem will anyway be discovered when anyone tries to run any other kind of full node.

Conclusion

Witnessless Sync doesn't reduce security any more than a long-running pruned node. By extending the assume-valid shortcut to cover witness commitments, size and sigops limits, and the prohibition of witness data in legacy inputs, we are outsourcing checks no more critical than the full script validation we already trust. Just as pruned nodes perform a one-time data-availability check at IBD—sufficient for the node to run indefinitely—we rely on the fact that this one-time check was already made when each block's scripts were validated by the assume-valid auditors.

In every practical scenario, then, skipping witness retrieval for assume-valid blocks cuts bandwidth without making our node or the network more vulnerable to data availability problems than it already is.

@ajtowns
Copy link

ajtowns commented Jun 9, 2025

At its core this is a suggestion to move from foregoing validity checks to also foregoing availability checks.

Two options maybe worth considering:

  • during IBD, randomly select 10% of blocks and request them with witness data and check their coinbase witness commitment, dropping nodes that give bad data, and stalling IBD if nobody has the data. Reduces the bw savings from ~40% to ~36% presumably, but collectively still checks availability.

  • use a similar approach to assumeutxo, and verify witness data availability after IBD is complete and your node is operational, so that bandwidth constraints are not a bottleneck. Unlike assumeutxo this wouldn't require a duplicate chainstate, just redownloading each block after segwit activation and checking the coinbase segwit commitment. Increases bandwidth usage overall 60% rather than reducing it by 40%, but still serves as an operational speedup for nodes that are bandwidth constrained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment