Have you ever looked at our codebase and wondered: "is it a Source or an Image? A Dataset or a Project? Why do we have both datasets[] and projects[] on the same record?"
These aren't just naming inconsistencies — they're symptoms of a missing piece. We never wrote down what our system actually is at the conceptual level. So over the years, storage details and legacy names leaked into services, routes, types, and UI code. And every new dev that joins has to reverse-engineer the domain from implementation artifacts.
This PR adds a small set of documentation files under docs/api/ that I'm calling the Conceptual API — or the system's DNA.
It's not an HTTP API spec. It's not a database schema. It's the compact domain model that answers: what are the core entities, what can you do with them, and what rules always hold true?
Three files:
overview.md— the one-pager. Entities, operations, value objects, invariants, side effects. If you read one file, read this one.entities.md— detailed attributes, types, and relationships for each entity and value object.implementation-mismatches/— a folder that tracks where the code diverges from the conceptual model, with specific file references.
The obvious benefit is onboarding — a new developer can read overview.md in 5 minutes and understand what Roboflow is before touching any code.
But the real goal goes further than that. It's not just about onboarding human developers. We increasingly build this system with coding agents — AI that reads our code, proposes changes, and writes implementations. This document becomes a source of truth for intent — where the system is going, not just where it is today.
When the DNA is explicit and lives in the repo, the agent understands the domain the same way we do. It knows that the concept is Image, not Source. It knows the board should depend on AnnotationBoardCard, not on batch-vs-job internals. So the code it produces naturally evolves toward the intended direction, instead of reinforcing legacy patterns.
The result, over time, is a codebase that's far more cohesive — easier to work with, easier to reuse, and easier to plug into at the right level of abstraction.
And the mismatches inventory gives us a concrete debt map. You don't need a big-bang rewrite. Pick one mismatch, translate at the boundary, move on. Small cleanups compound.
A few things worth highlighting:
- Image, not Source.
Imageis the domain concept.Sourceis how we store it. New code should speakImageat service boundaries. - ProjectImage and WorkspaceImage are value objects — the same image viewed in different contexts. One resolves annotations through the project's annotation group; the other shows which projects reference it.
- AnnotationBoardCard unifies Batches and AnnotationJobs at the board level. The board shouldn't care whether a card is backed by a batch or a job — that's an implementation detail.
- Version is not Model. A version is an immutable snapshot. You can train many models from the same version. We don't need to regenerate a version just to retrain.
This is a living document. It's meant to grow as we refine the domain model and shrink as we fix mismatches.
I'd love your feedback — especially if you spot something that doesn't match your mental model of how Roboflow works. That's exactly the kind of conversation this doc is meant to trigger.
And if you're working on a feature and you're unsure whether to use Source or Image, Dataset or Project — check the DNA first. That's what it's there for.
Recording tips:
- Share your screen on
overview.mdduring the walkthrough section (1:50–2:30) - Keep it conversational — pausing and ad-libbing is fine
- The hook (first 30s) is the most important part to nail