If we forget about the AI itself, and for each engineering task we need to:
-
Do some initial research
-
Create initial high-level technical plan, and separate to milestones (like implement BE part, implement FE part, add the Docs, Write Integration tests), etc.
-
Start working on each item in the plan
-
Do way more deep research on code itself
-
Write the code, but also shift left on testing, linting etc, and test as soon as possible. If you can evaluate success — do it as early as possible.
-
Being flexible, and challenge original assumptions, if original plan does not hold, and ensure that plan gets updated
-
Repeat
-
Now would in real org all this happen solely with AI agent - for sure not. Each task is a combination of mulitple minds, and cross collaboration of multiple teams - even just on engineering part. Not talkig about the whole product flow.
The tools like Claude Code, Cursor etc, getting good at “write the code” step, but if you want to achieve the rest from this engineering list, it require the heavy prompt tuning, and a lot of manual steps and hand-holding. And usually good only if task is small and clear enough. Right now they to do all the above steps anyway, and sometimes it works decent, sometimes crap.
When we talk about planning, we think reasoning, with as much related data as possible, and noise reduced. It usually also requires separate thinking model, while coding can be offloaded to more specialized ones, like Claude Sonnet. Probe engine’s job to ensure that reasoner will get this data as clean and as full as possible. Both cleaner and wider picture.
Being flexible with planning is also very important, to be able to update both high-level plan, and low-level implementation plan on the fly (sometimes with human in the loop where required). And essentially limit to what AI coding flows can do are tied to what planning step can do.
Good planner need to keep the history of tasks, split them, go back, have a memory of what worked, what not. Taskmaster is good one in this area https://github.com/eyaltoledano/claude-task-master. And good planner step usually needs way more info than just a code — like architecture of your system, your preferences, dependencies to look for and etc.
What I build, is Probe as low-level engine which allows to query code with low signal/noise ratio, and Probe Chat, which is an AI agent reference implementation, which use the probe engine inside the agentic loop, to answer code questions. In that case Probe itself acts as kind of planner, but very simple one.
So in terms going forward, I look to make probe be way more better at planning, or integrate with Taskmaster. And being able to offload implementation to the coding agents, like Claude Code and etc (and some of it already possible). So kind of like OpenAI Codex, or Google Jules, but way more focused on planning step, on cross human collaboration, and way more flexible and without vendor lock-in on s pecific agent.
Competition in AI coding space is huge, but all of them focus on fast user gratification, and no one really talks about real engineering with all the checks and balances. About more complex org structures, and how many steps it takes to push idea to the production. Also they focus on “engineer” mindset, while we have so many people in the org who will benefit from talking to code and from planning the steps, but not actually writing it.