OpenTelemetry revival

amaru uses the tracing crate and OpenTelemetry infrastructure for observability: both logs, traces and metrics. Those concepts have the particularity to be at the frontier between devs, users and DevOps and require specific care so that their true potential can be achieved.

While the current observability implementation in amaru is a great foundation we identified some shortcomings. This is an attempt to improve the situation so that everyone feels more confident in their day to day use of those.

What we want to achieve

For amaru DEVs, how to figure out when and how to add traces?
For amaru users (DevOps/pool operators/devs): consistency, easy discoverability and confidence (i.e. API stability)
For DevOps specifically: global coherency (i.e. are there holes in the global picture), is the granularity well-defined?

This means that Tracing is considered a product interface.

Context

Internally amaru uses the tracing crate as a mean to express both the concepts of traces and logs. It also uses the opentelemetry crates to:

Define the semantics of traces and spans are logically connected via span_id. They can be nested via parent_id.
Configure the collection of observability data.
Configure the export of observability data.

What those are

The semantics provided by the tracing and opentelemetry libraries are a bit different and can appear confusing at first.

Tracing

The tracing crate does not represents directly traces and logs. It generates 2 kinds of diagnostic information:

spans

From the documentation:

Unlike a log line that represents a moment in time, a span represents a period of time with a beginning and an end.

spans are created via the span! macro and its derivatives (e.g. span_info!). spans are logically connected via span_id. They can be nested via parent_id.

events

From the documentation:

Events are comparable to the log records emitted by unstructured logging code, but unlike a typical log line,
an Event may occur within the context of a span.

events are created via the event! macro and its derivatives (e.g. info!).

Both spans and events are then funneled to subscribers that act as global point of collection.

OpenTelemetry

OpenTelemetry (OTEL for short) defines a number of concepts, including:

log record.
event is a specific type of log record.
span

A trace is a logical grouping of spans based on a trace_id (see SpanContext). Events can be attached to spans. In Rust, this is done via add_event_with_timestamp. A log record can be optionally associated to a span if emitted while a span is active via the trace_id and span_id fields.

How they relate

A tracing event emitted with the event! macro or its derivatives is collected as an OTEL log record. A tracing span emitted with the span! macro or its derivatives is collected as an OTEL span.

NOTE: tracing events are not equivalent to OpenTelemetry span events. amaru currently doesn't create any span event. Nonetheless tracing events (e.g. info!), i.e. an OTEL log record, is linked to the surrounding span at runtime and collected by the OTEL libraries as a log record connected to a span, but not as an OTEL span event.

Collection of observability data

In amaru the observability data is collected via a Registry, which stacks several Layers offering different behaviours for collecting and exporting observability data.

A JSON "formatting layer" (created with tracing::fmt::layer()) to collect spans and events when they are emitted, and format them as JSON to be written to stderr.
An OTEL layer, provided by the tracing-opentelemetry which also captures spans and events but to export them as OTEL spans and log records to an OpenTelemetry Collector.

In each case the configuration of the layer decides which part of a span lifecycle can trigger the export / printing of data Event can be attached to spans. In Rust, this is done via add_event_with_timestamp. if specific filters need to be added.

Also note that OpenTelemetry exporters propose additional configuration for batching and exporting data. In general all log records are exported but spans are sampled based on custom heuristics.

Propositions

TLDR

spans are what 3rd party depends on. Cannot break, must be discoverable and documented. events are for human, what is typically considered logs. Used as structured logs anyway.

Next action items would then be:

introduce and use a alternative to instrument macros tailored to amaru needs and strongly typed
update the existing EDR to reflect those
develop some tooling to help with preventing regressions and generating scemas/documentation (i.e. a source code parsing crate)
integrate those tooling in amaru CI

API

Have a dedicated amaru-observability crate. Only this crate can depend on tracing crate; direct use of the tracing crate is forbidden via clippy rules. This enforces that only traces/events with the right details and leveraging Rust types can be used.

amaru-observability will define a number of macros/fns wrapping tracing. Mostly it involves reducing the API surface and introducing more typing.

Events

events ends up as logs (stderr only)
spans ends up as OTEL spans (OLTP, machine readable JSON via stdout)

Spans

level is always TRACE
spans have name, but no message; a name is a simple string w/o any .
target must be set; they follow a :: separated tree hierarchy (e.g. amaru::ledger::store)
favor fewer but bigger (i.e. more fields) spans
field are common accross spans as much as possible; they have canonical types
span events are not used
target and name is defined using Rust types
only use runtime field if cannot be avoided
always prefer using trace macro; including if it requires the introduction of an extra fn
no default fields, returned value show by default, Result hand-rolled
only Display fields, no Debug
set_span_kind (from OpenTelemetrySpanExt) might be relevant when interracting with the network (e.g. in pure-stage)
when too cubersome to trace all fields at once, use our typed version of Span::current().record

Being explicit and not relying on defaults improve the robustness and makes spans refactoring resistant.

Levels

For traces, a radical idea would be to just use the trace level always. We only ever use the info level for traces that we want to show up in human logs: a better way to do that would be to just create extra logs here (e.g. using info!).

Names

Trace hierarchy will come from their associated modules hierarchy (see more details in the syntax section). Therefore they must not contain and ., just simple names.

Gotchas

The event! macros will make sure parent_span_id i correctly propagated in most cases but when new threads are created. This might happen in amaru (e.g. in pure-stage) so specific care is mandatory: the parent span must be manually restored. Alternatively new threads created (regular or tokyo) must use in_current_span or instrument(span).

Also consider when implementing a subscriber that ENTER and EXIT events might be called multiple times in async context. CLOSE will only ever be called once, when the async block is dropped.

Syntax

A potential syntax could be:

// In amaru-observability/src/traces.rs

define_schemas! {
    consensus {
        chain_sync {
            VALIDATE_HEADER {
                required {
                    point_slot: Slot,
                    point_hash: Hash,
                }
                optional {
                    peer_id: PeerId
                }
            }
        }
    }
}

// In some amaru module

#[trace(consensus::chain_sync::VALIDATE_HEADER)]
fn import_headers(point_slot: Slot, point_hash: Hash) {
  do_extra_stuff(peer_id);
}

#[augment-trace(consensus::chain_sync::VALIDATE_HEADER)]
fn do_extra_stuff((peer_id: PeerId) {
}

// `trace` macro ensures that all function arguments matches 
// `target` is inferred from the nested structure of the name symbol: `amaru::consensus::chain_sync`
// All function arguments are used as `fields` (except `self`). All those must be `Display`able. This might forces the introduction of extra dedicated function but that sounds a reasonnable tradeoff and might guide us toward better function definition.

// `augment-trace` adds extra fields to the current span. Annotated function argument must match associated `optional` fields.

Guidelines

Only instrument architectural boundaries.

Examples:

public APIs
async task entry points
IO boundaries
state machine transitions

Avoid:

inner loops
helpers
pure functions

This keeps traces meaningful and stable.

Documentation

All spans must be documented using JSON schemas. While documentation is desirable, it's pretty useless if users can't easily find it. Some mechanism allows user to easily discover and understand which traces is emited when.

Some documentation JSON schemas standard exists but focuses on changes. It might be amended to also document current state.

Semantic conventions

Follow semantic conventions as much as possible. OpenTelemetry dedicated significant energy to define generic elements that matches a lot of scenarios. Using them ensures a smooth integration with the various OTEL tooling.

Some specific conventions worth considering:

Preventing breaking changes

CI help us checking if commits break existing spans. This is done by relying on a yet-to-be-created tool that will parse the whole codebase to extract span definitions. Ideally some logical links between those spans will also be extracted by this tool.

Both documentation and regression alerts are generated from those.

Testing

As an effort to prevent higher-level regression (full traces) some testing is mandatory. Insta Snapshots tests are used to ensure well-known scenarios stay consistent.

"In each case the configuration of the layer decides which part of a span lifecycle can trigger the export / printing of data Event can be attached to spans. In Rust, this is done via add_event_with_timestamp. if specific filters need to be added."

Seems like the whole "Event can be attached to spans. In Rust, this is done via add_event_with_timestamp." is extra? Especially because the exact same sentence can be read above 😅 ?

spans are what 3rd party depends on. Cannot break, must be discoverable and documented. events are for human, what is typically considered logs. Used as structured logs anyway.

I disagree about the second part. event should also be treated as an API and deemed both stable and consistent. The human side of observability is mostly a matter of presentation IMO. Humans don't necessarily want to read raw JSON logs (although some do!); but would rather process them through tools that allows for formatting, filtering and coloring them. Or more generally, a "pretty" output is what humans expect. But the source data is the same.

Have a dedicated amaru-observability crate. Only this crate can depend on tracing crate; direct use of the tracing crate is forbidden via clippy rules.

Seems like a reasonable idea to keep the log API under control. Provided that we can create a sufficiently decent and friendly API (the proc attribute macros like instrument from tracing are definitely handy!).

events ends up as logs (stderr only)

My preference would be that what end up on stderr is a matter of severity. At least currently, this is how it's setup: only DEBUG and above ends up on stderr, unless JSON export is activated. Otherwise it has proven confusing to get two streams on both stderr and stdout.

target must be set; they follow a :: separated tree hierarchy (e.g. amaru::ledger::store)

Can we be more specific here, especially with regard to open telemetry convention. By default, targets are set to the module they're from; which makes them quite unstable. Also; if traces end up being all defined in a separate crate, it may become tricky.

So I believe it's much more practical to define namespaces based on the domain they apply to (ledger, consensus, plutus, ...) when relevant, and otherwise following OTL semantic conventions when applicable (network, db, http, ...).

field are common accross spans as much as possible; they have canonical types

In particular, one thing we haven't been much careful about is to leverage span hierarchy and contexts. For example, if a top-level span already captures slot and epoch as a field, it shouldn't be necessary to add "epoch" or "slot" as a field in any of the child event or span. Often enough, we've been duplicating stuff like this.

Although this is a bit tricky because we can't reliably see parent's span fields when defining child spans. So I am not sure how we can propagate context, and ensure that the hierarchy contains the relevant information.

span events are not used

This ended up being annoying back then, because some spans don't have child events (and perhaps that's the root cause of the issue), which made them unavailable to "humans" through the exported pretty logs. So I ended up adding an event on span's end iirc, so that even "empty spans" would be visible from the outside.

no default fields, returned value show by default, Result hand-rolled

Not sure to follow?

For traces, a radical idea would be to just use the trace level always. We only ever use the info level for traces that we want to show up in human logs: a better way to do that would be to just create extra logs here (e.g. using info!).

info or debug have roughly the same semantic in the codebase today. I think my preference would be:

trace as first choice, unless too noisy to be showed on a stderr feed
info for events relevant to humans
warning for unexpected behavior that aren't immediately causing problem, but might.
error for things that should definitely not happen

Then remains debug, which we can use only for ... debugging! So, there should not be any debug on main, but we may add some when troubleshooting issues to easily filter on them and distinguish them from "normal" logs.

jeluard/observability.md

Select an option

No results found