You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ADRs are meant to capture key decisions that have long-term implications for your system or organization. If a decision introduces a new dependency, alters fundamental data flows, or significantly affects architecture and team processes, it likely requires an ADR.
On the other hand, minor decisions - like tweaking a library version or refactoring a single function usually don’t need an official record.
Use your judgment: if it’s significant enough that others might question later or that will be difficult to undo, document it. Otherwise, don’t let the process become an administrative burden.
LLMs are actually useful for RFC preparation. You can use them to:
Research trade-offs between technologies
Generate initial drafts of pros/cons lists
Summarize documentation for technologies you’re evaluating
Identify edge cases you might have missed
The point being though is NOT to let an LLM write the entire RFC for you and you just publish it, but rather to use an LLM as an immediate thought partner.
Our microservices currently communicate via synchronous HTTP APIs, causing latency issues and occasional disruptions when one service is unavailable. Also, the cost to handle the entire traffic is very high. Most of the communication, especially reads, don't require synchronous flow. We also anticipate a need to handle significantly higher request volumes in the near future. To increase resiliency, scalability and cost-efficiency, an asynchronous communication would be preferred.
Decision
We will transition from synchronous HTTP API calls to a Kafka-based event-driven architecture for communication between our microservices.
Rationale
Scalability: Kafka’s event-driven model allows simple horizontal scaling of consumers, which is critical for our anticipated traffic growth and is also very cost-efficient.
Resilience: Asynchronous messaging decouples microservices, so one service’s downtime doesn’t cascade throughout the system, which is especially important for writes/commands. That will allow us to take advantage of Saga pattern.
Cost-efficiency: A simple proof of concept indicates that just 3 Kafka consumers can read the equivalent amount of data as 25 Sidekiq workers reading from HTTP API. Also, it implies that we will be able to scale down web workers of the upstream service by 40% as we won't be reading this data from the HTP API.
Implications
Operational Overhead: We need to maintain a Kafka cluster, which introduces new complexity for monitoring, alerting, and administration. Amazon MSK service can be a great solution here.
Kafka Learning Curve: Engineers will need to gain familiarity with event-driven design patterns and Kafka itself.
Deployment and Migration Plan: We’ll roll out event streams incrementally to avoid a “big bang” migration. Secondary microservices will be adapted first, followed by the more critical ones.
Alternatives Considered
Continue with Synchronous HTTP: Would be simpler to maintain, but scalability, resiliency and cost-efficiency trade-offs are not acceptable in the long run.
Use a Different Message Broker (e.g.RabbitMQ): While viable, Kafka’s persistence and proven track record with large-scale event processing made it more appealing.
Author: [Your name]
Date: [Date]
Status: Draft | Under Review | Decided | Superseded
Decision Deadline: [Date - usually 3-5 days from creation]
Summary
One paragraph. What is this about and why are we discussing it?
Context
What's the current situation? What problem are we solving?
Why now? Include relevant constraints, requirements, and background.
Priorities and Requirements (Ranked)
This is the most important part. List what actually matters for this decision, in order of importance. Be specific and quantifiable where possible.
[Priority name] - [Why this matters. What's the business or technical reason?]
[Priority name] - [Why this matters?]
[Priority name] - [Why this matters?]
Example:
Cost - We're operating at thin margins; any infrastructure cost increase directly impacts profitability
Development velocity - Our roadmap depends on shipping three features this quarter
Operational complexity - We have a small ops team; anything complex will create bottlenecks
Note: People often disagree on decisions because they're weighing priorities differently. Making priorities explicit is where the real decision-making happens.
Proposed Solutions
Option A: [Name]
Description of the approach.
Pros:
...
Description of the approach.
Cons:
...
How this performs against priorities:
Cost: [How does this affect cost? High/Medium/Low impact and direction]
Development velocity: [How does this affect velocity?]
Operational complexity: [How does this affect ops complexity?]
Estimated effort: X weeks/months
Risk level: Low/Medium/High
Other trade-offs: [Anything else worth noting]
Option B: [Name]
...
Option C: Do Nothing
Often you should include this option. Sometimes the answer is "not now".
Recommendation (Optional!)
Which option do you recommend and why? Focus on how it aligns with the priorities you outlined above.
Stakeholders
Who needs to be involved in this decision? Tag them.
@backend-team (affected by implementation)
@security-team (compliance implications)
@product-owner (timeline impact)
@infrastructure (operational concerns)
Open Questions
What do you still need input on?
Timeline
When does this decision need to be made? What's driving that deadline?