JoseSK999/witnessless-sync.md

Last active October 2, 2025 18:33

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/JoseSK999/df0a2a014c7d9b626df1e2b19ccc7fb1.js"></script>
Save JoseSK999/df0a2a014c7d9b626df1e2b19ccc7fb1 to your computer and use it in GitHub Desktop.

Download ZIP

Witnessless Sync: Why Pruned Nodes Can Skip Witness Downloads for Assume-Valid Blocks

Raw

witnessless-sync.md

Witnessless Sync

Why pruned nodes can skip witness downloads for assume-valid blocks

Background

Two years ago I asked in BSE why (segregated) witness data is downloaded for assume-valid blocks in pruned mode. We don't validate these witnesses, and we delete them shortly after. Pieter Wuille explained that skipping witness downloads would require extending the set of consensus checks delegated to assume-valid. But implementing this change is relatively straightforward, and because SegWit witness data now makes up a significant share of block bytes, omitting it can cut bandwidth by ~34% over the full chain (and 48-54% in recent years).

What witnesses cost today

Updated: 2025-10-02 (snapshot at height 917,000).

We measured the share of witness bytes (size - strippedsize) across the full chain and recent windows. At this snapshot the full chain is ~690 GB, of which ~232 GB are witnesses—more than a third of all bytes (33.7%).

Range	Witness Share
Last 100k blocks	54.17%
Last 200k blocks	53.22%
Last 300k blocks	48.06%
Full chain (height 917,000)	33.70%

More interestingly, over the last 4 years, witnesses account for more than half of all bytes (driven mainly by P2TR adoption and inscription traffic since early 2023). If this trend persists, when the chain reaches 1 TB the cumulative witness share will be ~40% (400 GB).

Extending `assume-valid`: Required Additional Checks

If I don't miss anything, the list of checks that assume-valid should now cover is:

Witness coinbase commitment (wtxid)
Witness size and sigops limits, including block weight
No witness data allowed in legacy inputs

This doesn't seem like a big deal compared to the full script validation we already outsource to assume-valid. If assume-valid covered these checks as well, then we could directly omit witness downloads for blocks within the assumed range.

An important thing to note is that all current consensus checks automatically pass even if we don't have the witnesses, because SegWit was a backwards-compatible soft fork that only constrained the rules. We don't even need to disable the wtxid commitment check, as it's optional when a block doesn't contain witnesses.

This idea led to a Bitcoin Core PR, with several conceptual ACKs but also some concerns about a security reduction and the loss of data availability. Below I will try to properly reason about this change and address any concerns.

Security Scenarios Analysis

First of all, let's see what are the bad scenarios that can, in theory, happen in both the current assume-valid mode and in the witness-skipping one.

Scenario A: Invalid-chain via `assume-valid`

The bad scenario in the current assume-valid mode is that developers, reviewers and maintainers lie to us and make us sync to an invalid chain. But we would need a majority of hashpower mining on that invalid chain as well.

Scenario B: Witness data unavailability

A new bad scenario that can happen if we skip witness downloads for assume-valid blocks is that we sync to a chain where some witnesses are not available (and so no kind of node can sync except for us). We have two options:

A case where developers, reviewers and maintainers lied to us, because the witnesses were not available when they validated the block. This likely means we also need a majority of hashpower mining blocks that lack witnesses and building on top of them.
A case where this lack of availability appeared after the assume-valid hash was audited as valid (they didn't lie at the time).

Trust Assumptions in `assume-valid` Mode

In Bitcoin Core, it is accepted that assume-valid does not change the trust assumptions of the software. In other words, users of Bitcoin Core can, without added risks, assume the scenario A won't happen. This is based on the fact that scripts are widely available for independent audit, and developers themselves are already trusted to not do bad things that reviewers and external auditors don't detect.

Now, it is easy to see that stating the validity of all scripts implies all scripts have been available to you (otherwise how do you even know they are valid if not available?). In fact, missing witnesses means the script evaluations fail, which assume-valid already covers.

Hence, the scenario B.1 is not a concern as it is an implied premise of assume-valid. In other words, we are already trusting developers, reviewers and maintainers to check witness data availability after the block was mined.

Thus, the only new concern is the scenario B.2, that is, the case where witness data availability is lost after the block was mined and validated by auditors. Losing data availability means we no longer have any reachable archive node (which by definition retains all witness data). But in that case we won't be able to sync, as we still require all legacy block data and all witnesses after the assume-valid range.

The new behavior only manifests when no archive nodes are available, but there are peers serving all legacy block data and all witnesses beyond the assume-valid block. In this scenario, our node will fully sync with the Bitcoin network, whereas other node types will stall during IBD. This is the only behavior that would change, and only under these circumstances.

Comparison with Pruned Node Behavior

We could think that in this situation our node shouldn't be able to sync, just like the rest of the nodes. But the outcome of syncing is just reaching the same state as the whole Bitcoin network, which in this scenario consists entirely of pruned nodes, and "archive nodes" with missing witnesses or pre-SegWit.

The synced pruned nodes will remain fully operational, unless they're told to reindex or switch to archive mode. These pruned nodes only checked data availability once, during their IBD, and they can run indefinitely without the need for re-downloading old data. If one-time availability checks are acceptable for regular pruned nodes, they're equally acceptable for a witness-skipping pruned node—both rely on the fact that those witnesses were available at an earlier point, verified locally or via the Open Source process.

Then, if we can skip the script evaluations because they are attested by developers, reviewers and maintainers, we can also skip the one-time availability check for witnesses, as it is implicitly verified by the same group of people.

If you think we must re-check witness availability during IBD because assume-valid only guarantees availability at a past moment, then by that logic pruned nodes would also have to rescan the entire chain periodically. But since regular pruned nodes don't re-check data availability, there's no reason our specific kind of pruned node should either.

And anyway, if this catastrophic availability failure really happens, then the root of the problem is the lack of archive nodes, not the proliferation of pruned ones. Moreover, asking for the witness data periodically only helps identify the problem, but this problem will anyway be discovered when anyone tries to run any other kind of full node.

Conclusion

Witnessless Sync doesn't reduce security any more than a long-running pruned node. By extending the assume-valid shortcut to cover witness commitments, size and sigops limits, and the prohibition of witness data in legacy inputs, we are outsourcing checks no more critical than the full script validation we already trust. Just as pruned nodes perform a one-time data-availability check at IBD—sufficient for the node to run indefinitely—we rely on the fact that this one-time check was already made when each block's scripts were validated by the assume-valid auditors.

In every practical scenario, then, skipping witness retrieval for assume-valid blocks cuts bandwidth without making our node or the network more vulnerable to data availability problems than it already is.

Author

JoseSK999 commented May 29, 2025 •

edited

Loading

Thanks for the answer @RubenSomsen!

[...] As you point out (in your own words), downloading and then discarding witness data means that you know it was available at some point in time, whereas not downloading it means you don't know if it ever was available.

I don't think this is true for assume-valid.

If you use assume-valid you trust that the scripts are valid.
In order for the scripts to be valid, the witnesses must have been available. Missing witness data means script evaluation fails, which we assume not to be the case because of 1.
Hence, you do know the witnesses were available at some point, because it is a premise of assume-valid.

That's the whole point of this post: downloading the witnesses in this case means we are not checking availability once, but twice. assume-valid already covers validity and past availability.

RubenSomsen commented May 29, 2025

I see your point, but I still don't think everyone will be comfortable treating assumevalid as a measure for availability, especially considering this is the same tradeoff as not downloading anything except the UTXO set. Why not jump straight to that? For an alternative client this would all be fine, but for Core it will be a harder sell.

It's also unclear to me who benefits from this change. Is there a user base that is on metered connections that wants to do a pruned assumevalid sync while saving some bandwidth? Or is the goal just to speed up IBD, in which case the more conservative approach of background downloading would equally suffice?

Author

JoseSK999 commented May 30, 2025

I see your point, but I still don't think everyone will be comfortable treating assumevalid as a measure for availability, especially considering this is the same tradeoff as not downloading anything except the UTXO set. Why not jump straight to that? [...]

I think this concern is over assume-valid, not Witnessless Sync. In this proposal we simply skip the exact checks that Bitcoin Core already covers under assume-valid. This includes the one-time witness availability check, required for script validation. We only extend assume-valid slightly to cover the remaining witness-related checks (the 3 rules mentioned in the post).

Only if Core was willing to skip ALL block validation—i.e. redefine assume-valid to cover literally all block consensus checks—could you stop downloading blocks altogether. Otherwise you will need the blocks to locally verify the non-covered consensus rules. We just inherit whatever definition assume-valid has.

We can also say that assume-valid "is the same tradeoff as not validating anything in the block". But this wasn't a deal-breaker for Core. So your statement that "this is the same tradeoff as not downloading anything except the UTXO set" shouldn't be a deal-breaker either, simply because this can only happen in the 0% block validation possibility that wasn't a deal-breaker.

It's also unclear to me who benefits from this change. Is there a user base that is on metered connections that wants to do a pruned assumevalid sync while saving some bandwidth? Or is the goal just to speed up IBD, in which case the more conservative approach of background downloading would equally suffice?

I fully agree with you here. Post-IBD witness downloads could be a more conservative solution if people want to re-check availability. It can be acceptable to download hundreds of GBs the days after you have synced unless you really want to reduce total bandwidth usage.

RubenSomsen commented Jun 2, 2025

In this proposal we simply skip the exact checks that Bitcoin Core already covers under assume-valid

You're portraying this as an insignificant change.

Only if Core was willing to skip ALL block validation—i.e. redefine assume-valid to cover literally all block consensus checks—could you stop downloading blocks altogether

You're portraying this as a substantial change.

To me it's exactly the other way around. There are validity checks and there are availability checks. Assumevalid currently only covers a subset of the validity checks. Expanding this to include all validity checks is a relatively minor change to me, because it all falls under the same risk category: if the assumevalid trust assumption is unfounded then coins could get stolen.

Redefining assumevalid to include availability checks to me is the bigger departure. It's an entirely new category of error that you're introducing. Were the coins stolen? Maybe (not), but we don't have the data to (dis)prove it.

Author

JoseSK999 commented Jun 6, 2025

Redefining assumevalid to include availability checks to me is the bigger departure.

The assume-valid definition is "all script evaluations pass". We don't change the definition (w.r.t. availability) because witness availability is a premise for script validity. What we do change is whether to re-check witness availability at present time or not.

It's true that if Core lied to us about witnesses being available when they checked it, then every pruned node would catch their lie except for the Witnessless Sync ones. We would sync to an invalid chain (missing witnesses = invalid), assuming a 51% attack. However, this attack (syncing to an invalid chain) is already possible for all assume-valid nodes if Core lies about validity and there's a 51% attack.

It's an entirely new category of error that you're introducing. Were the coins stolen? Maybe (not), but we don't have the data to (dis)prove it.

I don't think the "missing witnesses" case is a different scenario/category from the "available but invalid witnesses". Both mean invalid script evaluation:

If assume-valid blocks have invalid witnesses (and we get a 51% attack), coins get stolen.
If assume-valid blocks do not have all witnesses (and we get a 51% attack), coins get stolen. A missing witness is a failure to (legitimately) spend coins, just like an invalid witness.

You say "we don't have the data to (dis)prove it", but the lack of witnesses is by itself the proof that the block is invalid, as dictated by the validation function.

ajtowns commented Jun 9, 2025

At its core this is a suggestion to move from foregoing validity checks to also foregoing availability checks.

Two options maybe worth considering:

during IBD, randomly select 10% of blocks and request them with witness data and check their coinbase witness commitment, dropping nodes that give bad data, and stalling IBD if nobody has the data. Reduces the bw savings from ~40% to ~36% presumably, but collectively still checks availability.
use a similar approach to assumeutxo, and verify witness data availability after IBD is complete and your node is operational, so that bandwidth constraints are not a bottleneck. Unlike assumeutxo this wouldn't require a duplicate chainstate, just redownloading each block after segwit activation and checking the coinbase segwit commitment. Increases bandwidth usage overall 60% rather than reducing it by 40%, but still serves as an operational speedup for nodes that are bandwidth constrained.

JoseSK999/witnessless-sync.md

Select an option

No results found

Select an option

No results found

Witnessless Sync