- Status: Pre-proposal draft
- Authors: Kai / Mitch
- Date: 2026-03-04
snarkVM's error handling is inconsistent. The codebase uses a mix of anyhow
throughout library crates, panic!-based halts caught with catch_unwind,
string-flattened error chains, and incomplete error context in logs. This leads
to:
- Lost debugging information - error chains are flattened to strings before reaching logs, discarding the structured cause chain (#3147).
- User-facing panics -
Environment::haltpanics the host process, andcatch_unwindcannot always recover safely (leo#28992). - Inability to handle specific failure modes -
anyhow::Errorerases error types, preventing downstream tooling from matching on specific failures. catch_unwindhacks -try_vm_runtime!conflates VM halts with implementation bugs, andAssertUnwindSafewrappers suppress the type system's unwind-safety guarantees.
The vision: structured thiserror types for all library crates, clean
source() chains, no user-reachable panics, and downstream applications owning
error formatting.
-
Clean error chains - Each error layer describes only its own context. Inner errors are accessible via
source(), never duplicated inDisplay. -
No user-reachable
panic!s - All anticipated runtime failures returnResult. Any user-facing panic is treated as a VM bug. -
Structured
thiserrortypes for all library crates - Concrete enums with exhaustive matching.anyhowis acceptable only in top-level applications and tests. -
Remove
catch_unwind/try_vm_runtime!- Once halt is aResult, panic-catching infrastructure becomes unnecessary. -
Downstream-controlled formatting - Libraries expose
source()chains; applications choose presentation (single-line, multi-line, JSON, etc.). -
Descriptive structured error data - Errors carry machine-readable context (instruction index, program ID, operand values) enabling rich tooling feedback and Leo source mapping.
The Environment trait defines a halt method that panics unconditionally:
// console/network/environment/src/environment.rs:61
fn halt<S: Into<String>, T>(message: S) -> T {
panic!("{}", message.into())
}There are 262+ call sites across E::halt (~152 occurrences) and A::halt
(~110 occurrences) throughout the console and circuit crates.
A convenience trait OrHalt (in console/network/environment/src/helpers/or_halt.rs)
further propagates this pattern:
fn or_halt<E: Environment>(self) -> T {
match self {
Ok(result) => result,
Err(error) => E::halt(error.to_string()),
}
}catch_unwind cannot always recover from these panics safely. Panics inside
borrowed RefCell in circuit thread-local state are not unwind-safe - the
AssertUnwindSafe wrapper suppresses the compiler's warning but does not fix
the underlying unsoundness. This manifests as user-facing crashes
(leo#28992).
Two patterns break source() chains:
String interpolation flattening: Patterns like anyhow!("something failed: {err}") or bail!("something failed: {err}") flatten the inner error into a
string, destroying the source() chain. The correct approach is
.context("something failed").
#[error] format string interpolation of #[from] fields: Some
thiserror types interpolate their #[from]/#[source] fields in the
#[error("...")] format string:
#[derive(Debug, Error)]
enum MyError {
#[error("operation failed: {0}")] // Duplicates the inner error message
Inner(#[from] InnerError),
}This causes message duplication when walking the chain - Display includes the
inner message, and so does source(). The std Error
docs
explicitly advise against this:
An error type with a child error should either [...] not mention the child error in its
Displayimplementation.
The correct pattern is:
#[error("operation failed")] // Describes only this layer
Inner(#[from] InnerError),
// or:
#[error(transparent)] // Delegates Display entirely
Inner(#[from] InnerError),The error chain is sometimes correctly constructed but not correctly rendered. For example:
// ledger/src/advance.rs:146
self.vm.add_next_block(block).with_context(|| "Failed to add block to VM")?;This properly chains errors. However, downstream consumers (e.g. snarkOS CDN
sync) log only "{err}", which renders only the top-level message. The full
chain - which contains the actual failure reason - is discarded.
This is a consumer-side problem. The chain is correct; the rendering is not.
snarkVM already provides utilities for chain rendering in
utilities/src/errors.rs:
/// Converts an `anyhow::Error` into a single-line string.
pub fn flatten_error<E: Borrow<anyhow::Error>>(error: E) -> String { ... }
/// Displays an `anyhow::Error`'s main error and its error chain to stderr.
pub fn display_error<E: Borrow<anyhow::Error>>(error: E) { ... }These are currently specialized to anyhow::Error. Generalizing them to
&dyn std::error::Error would make them useful for the thiserror migration.
See #3147 and snarkOS#3795.
The try_vm_runtime! macro (utilities/src/vm_error.rs) catches
halt-panics using catch_unwind:
macro_rules! try_vm_runtime {
($e:expr) => {{
let previous_hook = panic::take_hook();
panic::set_hook(Box::new(|err| { /* reformat as "VM safely halted" */ }));
let result = panic::catch_unwind(panic::AssertUnwindSafe($e));
panic::set_hook(previous_hook);
result
}};
}Problems:
- Conflates VM halts with bugs - a panic from
halt("invalid operand")is indistinguishable from a panic caused by an index-out-of-bounds bug. - Global panic hook manipulation -
take_hook/set_hookis not thread-safe in concurrent contexts. Concurrent VM executions can interfere with each other's panic hooks. AssertUnwindSafe- suppresses the compiler's unwind-safety analysis, masking potential state corruption.
Beyond try_vm_runtime!, catch_unwind(AssertUnwindSafe(...)) appears in 18
files across the codebase, including:
circuit/program/src/data/literal/cast/mod.rs- 2 instancescircuit/types/field/src/div.rs,div_unchecked.rs,inverse.rs- 3+ instancescircuit/types/integers/src/neg.rs,lib.rs- 2+ instancescircuit/types/scalar/src/helpers/from_field.rs- Various test files
Most of these exist because circuit operations signal failure via panic (through
E::halt) rather than returning Result. Once these operations return
Result, the wrappers become unnecessary.
Some deserialization sites use std::io::Error::other(stringified_error),
losing type information. The io_error and into_io_error helper functions in
utilities/src/errors.rs encode this pattern. While io::Error boundaries are
sometimes unavoidable (e.g. Read/Write trait impls), the error chain should
be preserved where possible. See discussion in
#3056.
| What | PR/Issue | Status |
|---|---|---|
| Return error with failing instruction index | #3081 | Merged |
Result for constraint enforcement and assertions |
#3082 | Merged |
| Clippy fix for instruction wrappers | #3085 | Merged |
Isolate synthesizer-error crate |
#3122 | Merged |
CheckBlockError - concrete block-check errors |
#3050 | Merged |
The synthesizer-error crate (synthesizer/error/)
now provides a structured error hierarchy using thiserror:
VmExecError / VmAuthError / VmDeployError
└─ ProcessExecError / ProcessAuthError / ProcessDeployError
└─ StackExecError / StackEvalError
└─ IndexedInstructionError<InstructionError>
└─ InstructionEvalError / InstructionExecError
└─ EvalError / ExecError / AssertError
Each level carries Anyhow(#[from] anyhow::Error) variants as temporary escape
hatches for the migration. These are explicitly marked with // NOTE: ... Remove these variants as we migrate errors to thiserror.
The AssertError type demonstrates the target pattern - structured data
(operand values) in error variants:
#[error("'assert.eq' failed: '{lhs}' is not equal to '{rhs}' (should be equal)")]
Eq { lhs: String, rhs: String },PRs #3081 and
#3082 together establish
the migration pattern for replacing Environment::halt with Result:
- #3081 built the error propagation infrastructure: domain-specific error
types per subsystem,
IndexedInstructionError<E>to capture which instruction failed and at what index, and updated function signatures fromResult<T>(anyhow) toResult<T, ProcessExecError>etc. - #3082 converted the lowest-level circuit operations (
E::enforce(),E::assert_eq(),E::assert_neq()) to returnResult<(), ConstraintUnsatisfied>instead of panicking, and introducedAssertError,EvalError,ExecErrorwith structured data. Unconverted boundaries use temporary.expect()calls to maintain existing behavior while the conversion progresses.
Together they demonstrate the bottom-up, subsystem-by-subsystem approach:
convert the lowest-level operations first, introduce domain-specific error
types, use .expect() at unconverted boundaries, and propagate upward through
the error hierarchy.
| What | PR/Issue | Status |
|---|---|---|
Remove source interpolation from #[error] format strings |
#3172 | Open (WIP) |
| Improved panic handling infrastructure | #2927 | Open |
build.rs error chain checking (snarkOS) |
snarkOS#4127 | Draft |
| Track errors in snarkOS | snarkOS#3874 | Open |
| Issue | Repo | Summary |
|---|---|---|
| #2941 | snarkVM | Remove panic potentials in validator code paths |
| #3055 | snarkVM | On halt, return error with failing instruction index |
| #3056 | snarkVM | Replace anyhow in lib crates with thiserror |
| #2787 | snarkVM | Return descriptive error on failure to execute/evaluate |
| #3147 | snarkVM | Improve logged errors (chain context lost) |
| leo#28992 | leo | Panic on assert_eq on arrays (VM halt panics host) |
| leo#29035 | leo | On VM halt, error should provide Leo source context |
| leo#29036 | leo | Preserve source mapping during codegen for tooling |
| leo#27858 | leo | Testing framework panics instead of errors |
| snarkOS#3795 | snarkOS | Logs uninformative for node operators |
Complete #3172 - remove
source-error interpolation from #[error] format strings across the codebase.
This establishes the convention: each error layer describes only its own
context. Inner errors are accessed via source(), not duplicated in Display.
Before:
#[error("failed to parse '{0}'")]
Parse(#[from] ParseError), // Display: "failed to parse 'invalid token at col 5'"
// source(): "invalid token at col 5" (duplicated)After:
#[error("failed to parse")]
Parse(#[from] ParseError), // Display: "failed to parse"
// source(): "invalid token at col 5" (no duplication)Continue the bottom-up approach established in #3081 and #3082. Convert one subsystem at a time:
- Identify the next subsystem - Pick a group of related
E::halt/A::haltcall sites (e.g. field arithmetic, integer operations, group operations, string operations, Merkle tree verification). - Introduce domain-specific error types - Define
thiserrorenums for the subsystem's failure modes. These should carry structured data where useful (operand values, indices, etc.), not just string messages. - Convert lowest-level operations first - Change the leaf functions from
E::halt(msg)toreturn Err(SpecificError::Variant { ... })and update their return types. - Use
.expect()at unconverted boundaries - Where a converted function is called by unconverted code that still expects infallible results, use.expect("justification")temporarily. This preserves existing behavior while making the converted boundary explicit and greppable. - Propagate upward - As more subsystems are converted, the
.expect()boundaries move upward through the call stack until they reach the public API surface (e.g.VM::execute), where the error is returned to the caller.
Prioritization: Focus on subsystems where panics cause the most downstream pain first - instruction execution (done), constraint enforcement (done), then field/integer/group arithmetic, string operations, and record/plaintext serialization.
For upstream trait impls (e.g. std::ops::Div, std::ops::Add) where the
trait signature does not allow Result, consider introducing new CheckedDiv or CheckedAdd alternatives. Only keep panic! directly if these represent genuine logic errors or are only reachable through already-validated code paths.
As subsystems are converted to return Result, the catch_unwind wrappers
around those subsystems become unnecessary. Remove them incrementally:
- Remove
catch_unwind(AssertUnwindSafe(...))from converted call sites. - Once all halt sites in production paths return
Result, remove thetry_vm_runtime!macro fromutilities/src/vm_error.rs. - Downstream (leo, snarkOS) can remove their
catch_unwindwrappers as the corresponding VM APIs are converted.
Validation: Ensure no remaining code path relies on panic-based error signaling for control flow.
Crate-by-crate migration, prioritized by downstream impact:
synthesizer- already started withsynthesizer-error. Remove temporaryAnyhow(#[from] anyhow::Error)variants as concrete types are introduced.ledger- block validation, storage, transaction processing.console- parsing, serialization, type conversion.algorithms- cryptographic operations.
Each crate follows the same pattern:
- Define a
thiserrorenum for the crate's error domain. - Replace
anyhow::ResultwithResult<T, CrateError>. - Use
#[from]for conversions from sub-crate errors. - Remove
anyhowfromCargo.tomlonce fully migrated.
Consider #[non_exhaustive] selectively for error types in public APIs to allow
adding variants without breaking downstream.
With typed errors in place, enrich them with machine-readable context:
- Instruction index - already implemented in
IndexedInstructionError. - Program ID - which program was being executed.
- Function name - which function within the program.
- Operand values - already demonstrated in
AssertError. - Source mapping - enable Leo to map VM errors back to source locations (leo#29036).
This phase transforms error handling from a debugging concern into a tooling enabler - IDEs, testing frameworks, and block explorers can provide precise, actionable feedback.
- All error types use
thiserror. Noanyhowin public APIs. #[error("...")]describes only the current layer. Never interpolate#[source]or#[from]fields.- Use
#[source]or#[from]to build propersource()chains. panic!is reserved for genuine logic bugs (invariant violations that indicate a programming error, not a runtime condition).
anyhowis acceptable for top-level error aggregation (e.g. bin crates).- Tests may use
anyhow::Resultfor convenience or more easily formatting error chains for expectations.
- Libraries expose clean
source()chains. - Applications (snarkOS, leo tooling) choose the presentation - anyhow makes this easy:
- top-level / outermost error:
{} - whole error chain:
{:#} - multi-line error chain with backtrace:
{:?}
- top-level / outermost error:
- The existing
flatten_erroranddisplay_errorutilities inutilities/src/errors.rsshould be generalized fromanyhow::Errorto&dyn std::error::Erroras the migration progresses.
Error types should not carry generic parameters like N: Network. The thiserror
derive macro requires all #[source] and #[from] fields to implement
std::error::Error, which for a type like MyError<N> can create a recursive
trait bound if the error contains itself.
This was encountered with CheckBlockError<N>, where #[source] on
Box<CheckBlockError<N>> in the InvalidPrefix variant triggers the recursive
bound.
Instead: Use concrete types for error data. If an error needs to carry data
that varies by network (e.g. a block hash), use String or a type-erased
representation rather than N::BlockHash. Keep the N: Network generic on the
functions that produce errors, but erase it before constructing the error value.
During migration, Anyhow(#[from] anyhow::Error) catch-all variants serve as
escape hatches (as seen in synthesizer-error). These are explicitly temporary
and should be tracked for removal.
Typed results in public APIs require semver-aware coordination. Each phase should be validated with draft PRs against:
Full migration is large. The phased approach limits blast radius:
- Phase 1 is a low-risk formatting convention change.
- Phase 2 is the largest effort, but its incremental nature means each subsystem conversion is a self-contained, reviewable PR.
- Phases 3-5 follow naturally as Phase 2 progresses.
All changes must be backwards compatible. Error handling changes should not alter consensus behavior (same inputs producing same outputs). The risk is low
- error types affect failure paths, not success paths - but any change to
which operations return
Errvspanic!must be validated against the existing test suite and network behavior.
- Pro: Allows adding error variants without breaking downstream.
- Con: Prevents exhaustive matching, forcing
_ =>catch-all arms. - Recommendation: Use selectively for error types at crate boundaries that are likely to grow. Internal error types can remain exhaustive.
During migration, the codebase will contain both anyhow and thiserror
errors. The Anyhow catch-all variant pattern (as in synthesizer-error) is a
practical bridge. The key discipline is ensuring new code uses thiserror and
that Anyhow variants are tracked and eventually removed.