Skip to content

Instantly share code, notes, and snippets.

@adamw
Created March 13, 2026 12:25
Show Gist options
  • Select an option

  • Save adamw/a2d3cea5d7bd54006b6541c813dcf416 to your computer and use it in GitHub Desktop.

Select an option

Save adamw/a2d3cea5d7bd54006b6541c813dcf416 to your computer and use it in GitHub Desktop.

Build a service that receives SOAP notifications about train arrivals and departures, publishes them to Kafka, and materializes hourly event views in S3.

Do not invent new features, or over-complicate. Keep it simple, implementing only what's required, as described by the specification.

Development process

Write a plan

First write a step-by-step implementation plan. Store the plan in a plan.md file. Do not include any code in the plan.

  • The plan should consist of multiple tasks
  • Tasks should be designed to be implemented individually, one by one
  • Tasks should be grouped by feature / technical concern, so that when all tasks from a group are implemented, the feature is complete
  • Each task should handle a single concern and gradually move the system toward the goal
  • Make sure there are no additional features planned

Implement

Execute the implementation plan step by step

  • Iterate on the implementation of consecutive tasks. The code MUST compile without warnings, and the tests MUST pass
  • Unit tests should be focused, non-overlapping, each covering a single well-defined scenario
  • After completig all tasks from a task group, ALWAYS perform a code review, taking into account the coding guidelines. You MUST run a code review before proceeding to the next task group.
  • ALWAYS apply code review remarks, and then repeat the review process. If there were no remarks, proceed to the next task group.
  • All developed features should be integrated with the rest of the system, so that there's no dead or unreachable code. Anything that's developed must be somehow reachable from the main entrypoint.
  • Commit the result and proceed to the next task in the implementation plan
  • Mark the task as done in the plan file

Work on the implementation autonomously. Do not ask any questions, resolve any issues on your own.

Tech stack

  • scala 3.8.x on JVM 21
  • sbt build system
  • direct-style approach
  • functional programming (immutable data types, pure functions) whenever possible
  • scala 3 features (enums, opauqe types, inlines, extension methods, givens etc.)
  • tapir for HTTP
  • ox for direct-style structured concurrency and streaming
  • scalaxb for xsd-to-Scala codegen
  • munit for tests

When implementing Scala 3 direct-style applications using Tapir, Ox, or sttp, consult the guide at: https://raw.githubusercontent.com/VirtusLab/direct-style-guide/refs/heads/master/index.md

The index lists self-contained chapters by use-case (error handling, authentication, testing, observability, persistence, configuration, etc.). Fetch the chapter relevant to your current task for implementation patterns and code examples.

Base URL for chapters: https://raw.githubusercontent.com/VirtusLab/direct-style-guide/refs/heads/master/

Feature specification

1. Project Setup: Status Endpoint

Set up a greenfield project with a GET /status health-check endpoint returning "OK". The HTTP server runs on port 8080.

2. SOAP Train Information Service

Add a SOAP 1.1 web service at /soap/TrainInfoService with three operations routed by the SOAPAction header:

  • NotifyArrival: accepts arrival details, returns confirmationId and receivedAt.
  • NotifyDeparture: accepts departure details, returns confirmationId and receivedAt.
  • GetTrainStatus: accepts trainNumber, returns current status. Returns a SOAP fault if train not found.

The XSD schema is provided in train-info.xsd. Generate typed classes from it. Build codecs that decode/encode SOAP envelopes. A catch-all endpoint returns a SOAP fault for unknown actions.

The service accepts an event producer interface (no-op for now — Kafka comes later). Unit tests use HTTP stub/mock servers.

3. OpenTelemetry Tracing and Metrics

Add OpenTelemetry with auto-configuration. Instrument HTTP endpoints with tracing and metrics. Thread the OpenTelemetry instance to components that need it. Ensure runtime metrics resources are closed on shutdown.

4. Logging

Add structured logging. Provide a reusable logging facility for service classes. Use parameterized log messages (not string interpolation).

5. SOAP Error Handling

Add custom error handlers that return proper SOAP faults for requests on SOAP paths:

  • Decode/parse failures → SOAP Client Fault
  • Rejected requests → SOAP Client Fault
  • Unhandled exceptions → SOAP Server Fault

Non-SOAP paths use the framework's default error handling.

6. Kafka Event Publishing

Add Kafka integration:

  • An event producer interface with publishArrival and publishDeparture, supporting resource cleanup
  • A TrainEvent model with event type, train details, timestamp, and optional arrival/departure fields, serialized as JSON
  • A Kafka implementation publishing to topic train.events with trainNumber as key
  • Inject into the service — publish after successful notify operations

Unit tests should use a no-op Kafka producer.

7. S3 Event View Builder

Add an Event View Builder that consumes from Kafka and materializes hourly JSONL files to S3:

  • Consumes from train.events (consumer group event-view-builder, no auto-commit)
  • Buckets events by UTC hour into local staging files, each line a JSON record containing the Kafka offset and event data
  • Periodically flushes dirty buckets to S3 (default every 10 minutes)
  • Dedup: per-bucket, per-partition offset tracking — skips events already stored. On startup, efficiently rebuilds offset state from existing files (downloaded from S3 if needed)
  • Storing event lists in memory is not an option - there might be too many events, which might cause running out of memory
  • Late events: drops events older than 3 hours
  • Safe flush: only commits Kafka offsets when ALL dirty buckets uploaded successfully — partial failures retry on next flush without committing, preventing data loss
  • Lifecycle: closes file handles on shutdown, cleans up temp directory
  • Abstracts storage behind an interface (S3 implementation + in-memory test implementation)
  • Purges local files for sealed buckets (older than 3 hours) after each flush

Tests cover: first startup, crash recovery from S3, late event dropping, independent bucket de-dup, upload failure retry.

8. Event View Builder Metrics

Add OpenTelemetry metrics to the Event View Builder and integrate them. Provide a noop default for tests.

  • Count events processed by reason (appended / dropped / skipped duplicate)
  • Count flush outcomes (success / failure)
  • Histogram for flush duration
  • Gauge for number of dirty buckets

9. Typed Configuration

Replace environment variables and hardcoded values with a typed configuration file:

  • Define a config model covering: Kafka bootstrap servers, S3 bucket name, HTTP port, flush interval
  • Load from a config file with sensible defaults for local development
  • Support environment variable overrides
  • Fail fast on invalid config at startup

10. OpenAPI Documentation

Expose auto-generated OpenAPI documentation via Swagger UI at /docs, covering all HTTP endpoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment