Skip to content

Instantly share code, notes, and snippets.

@vladiant
Last active February 8, 2026 19:55
Show Gist options
  • Select an option

  • Save vladiant/607072e55429a23686c0b33e8b134b26 to your computer and use it in GitHub Desktop.

Select an option

Save vladiant/607072e55429a23686c0b33e8b134b26 to your computer and use it in GitHub Desktop.
Architecture Decision Records
Why Documenting Architecture Decisions Matters
1. A Single Source of Truth
2. Better Onboarding and Cross-Team Collaboration
3. Encourage Thoughtful, Data-Driven Decisions
4. Simplify Architecture Evolution
ADRs are meant to capture key decisions that have long-term implications for your system or organization. If a decision introduces a new dependency, alters fundamental data flows, or significantly affects architecture and team processes, it likely requires an ADR.
On the other hand, minor decisions - like tweaking a library version or refactoring a single function usually don’t need an official record.
Use your judgment: if it’s significant enough that others might question later or that will be difficult to undo, document it. Otherwise, don’t let the process become an administrative burden.
https://newsletter.modern-engineering-leader.com/p/elevate-your-engineering-culture
1. Write RFC (1-2 days)
2. Async Review (Comments) (2-3 days)
3. Decision Meeting (30-60 min)
4. Write ADR (Same day)
https://lukasniessen.medium.com/how-to-make-architecture-decisions-rfcs-adrs-and-getting-everyone-aligned-ab82e5384d2f

LLMs are actually useful for RFC preparation. You can use them to:

  • Research trade-offs between technologies
  • Generate initial drafts of pros/cons lists
  • Summarize documentation for technologies you’re evaluating
  • Identify edge cases you might have missed

The point being though is NOT to let an LLM write the entire RFC for you and you just publish it, but rather to use an LLM as an immediate thought partner.

Duration: 30–60 minutes (not more)

Agenda:

  • Quick context (2 min) — “We’re here to decide X. Everyone’s read the RFC”.
  • Address open questions (10–15 min) — Go through unresolved comments and open questions from the RFC
  • Discussion (15–30 min) — Debate the options, raise new concerns
  • Decision (5–10 min) — Make the call

Who should be there:

  • The RFC author (runs the meeting)
  • Key stakeholders who commented
  • The decision maker (if that’s not you)

Keep the group small. 5–8 people max. Large meetings turn into status updates, not decision forums.

Title

Migrating from Synchronous HTTP API to Kafka

Status

Accepted

Date

2025-03-10

Context

Our microservices currently communicate via synchronous HTTP APIs, causing latency issues and occasional disruptions when one service is unavailable. Also, the cost to handle the entire traffic is very high. Most of the communication, especially reads, don't require synchronous flow. We also anticipate a need to handle significantly higher request volumes in the near future. To increase resiliency, scalability and cost-efficiency, an asynchronous communication would be preferred.

Decision

We will transition from synchronous HTTP API calls to a Kafka-based event-driven architecture for communication between our microservices.

Rationale

  • Scalability: Kafka’s event-driven model allows simple horizontal scaling of consumers, which is critical for our anticipated traffic growth and is also very cost-efficient.

  • Resilience: Asynchronous messaging decouples microservices, so one service’s downtime doesn’t cascade throughout the system, which is especially important for writes/commands. That will allow us to take advantage of Saga pattern.

  • Cost-efficiency: A simple proof of concept indicates that just 3 Kafka consumers can read the equivalent amount of data as 25 Sidekiq workers reading from HTTP API. Also, it implies that we will be able to scale down web workers of the upstream service by 40% as we won't be reading this data from the HTP API.

Implications

  • Operational Overhead: We need to maintain a Kafka cluster, which introduces new complexity for monitoring, alerting, and administration. Amazon MSK service can be a great solution here.

  • Kafka Learning Curve: Engineers will need to gain familiarity with event-driven design patterns and Kafka itself.

  • Deployment and Migration Plan: We’ll roll out event streams incrementally to avoid a “big bang” migration. Secondary microservices will be adapted first, followed by the more critical ones.

Alternatives Considered

  1. Continue with Synchronous HTTP: Would be simpler to maintain, but scalability, resiliency and cost-efficiency trade-offs are not acceptable in the long run.

  2. Use a Different Message Broker (e.g.RabbitMQ): While viable, Kafka’s persistence and proven track record with large-scale event processing made it more appealing.

References

RFC: [Title]

Author: [Your name] Date: [Date] Status: Draft | Under Review | Decided | Superseded Decision Deadline: [Date - usually 3-5 days from creation]

Summary

One paragraph. What is this about and why are we discussing it?

Context

What's the current situation? What problem are we solving? Why now? Include relevant constraints, requirements, and background.

Priorities and Requirements (Ranked)

This is the most important part. List what actually matters for this decision, in order of importance. Be specific and quantifiable where possible.

  1. [Priority name] - [Why this matters. What's the business or technical reason?]
  2. [Priority name] - [Why this matters?]
  3. [Priority name] - [Why this matters?] Example:
  4. Cost - We're operating at thin margins; any infrastructure cost increase directly impacts profitability
  5. Development velocity - Our roadmap depends on shipping three features this quarter
  6. Operational complexity - We have a small ops team; anything complex will create bottlenecks Note: People often disagree on decisions because they're weighing priorities differently. Making priorities explicit is where the real decision-making happens.

Proposed Solutions

Option A: [Name]

Description of the approach. Pros:

  • ... Description of the approach. Cons:
  • ... How this performs against priorities:
  • Cost: [How does this affect cost? High/Medium/Low impact and direction]
  • Development velocity: [How does this affect velocity?]
  • Operational complexity: [How does this affect ops complexity?] Estimated effort: X weeks/months Risk level: Low/Medium/High Other trade-offs: [Anything else worth noting]

Option B: [Name]

...

Option C: Do Nothing

Often you should include this option. Sometimes the answer is "not now".

Recommendation (Optional!)

Which option do you recommend and why? Focus on how it aligns with the priorities you outlined above.

Stakeholders

Who needs to be involved in this decision? Tag them.

  • @backend-team (affected by implementation)
  • @security-team (compliance implications)
  • @product-owner (timeline impact)
  • @infrastructure (operational concerns)

Open Questions

What do you still need input on?

Timeline

When does this decision need to be made? What's driving that deadline?

Title

Short title describing what this ADR is about

Status

Accepted | Superseded by ADR-xx

Date

Date

Context

Describe the nature of the problem that requires a decision and all relevant context around it.

Decision

Describe briefly the decision that was made.

Rationale

Explain the reasoning behind the decision and its trade-offs and why it is consider the preferred option.

Implications

Describe the side-effects of this decision, both technical and not-technical one. Include both positive and negative implications

Alternatives Considered

Describe any alternative solutions that were considered as a potential solution and why they were not chosen.

References

Optional. Include any links to resources that influenced the decision or might be helpful in understanding the subject of the decision in depth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment