A comprehensive, production-ready agent workflow for VS Code and VS Code Insiders that orchestrates seven specialized agents through the complete software development lifecycle. From initial research through planning, implementation, verification, and debugging — all with structured artifact tracking and goal-backward validation.
"says Claude sonnet 4.5 :)"
Its stil in test and development
This project is built with deep respect for the work that came before it. It draws on the orchestration concepts introduced by Burke Holland https://gist.github.com/burkeholland/0e68481f96e94bbb98134fa6efd00436 and the productivity philosophy behind GSD OpenCode https://github.com/rokicool/gsd-opencode. What you’ll find here is my own ultralight interpretation — a streamlined multi‑agent setup designed for clarity, speed, and practical everyday use inside VS Code Insiders.
Built for solo developers who want AI agent collaboration that works like a senior engineering team.
📝 Note: These install badges will work once this Gist is published and you replace cdeaa98b5d7dd612d525d73bdc456e28 with your actual Gist ID in the URLs below.
Install any or all agents directly into VS Code or VS Code Insiders. Each agent operates independently but works seamlessly when orchestrated together.
| Agent | Description | Install |
|---|---|---|
| Orchestrator | Coordinates the full lifecycle by delegating to subagents. Never implements directly. | |
| Researcher | Investigates technologies, maps codebases, Context7-first source verification. | |
| Planner | Creates roadmaps and executable plans. Plans are prompts — WHAT not HOW. | |
| Coder | Writes code following mandatory principles. Executes plans atomically with per-task commits. | |
| Designer | Handles all UI/UX. Prioritizes usability, accessibility, aesthetics. Never compromises on UX. | |
| Verifier | Goal-backward verification. Task completion ≠ goal achievement. | |
| Debugger | Scientific debugging with hypothesis testing. Persistent debug files and bias mitigation. |
Repository: https://github.com/japperJ/JP-agent-flow
The project coordinator. Breaks down complex requests into lifecycle phases and delegates to specialized subagents. Never implements anything itself.
- Model: Claude Sonnet 4.5 (copilot)
- Tools:
read/readFile,agent,memory - Purpose: Lifecycle coordination across Research → Plan → Execute → Verify → Debug → Iterate
Key Capabilities:
- Request routing (determines which agents to invoke for any task)
- Full 10-step execution model for greenfield projects
- Phase-based workflow with gap-closure loops
- Intelligent parallelization based on file-overlap rules
- Manages
.planning/artifact structure across all phases
When to use:
- Starting a new project from scratch
- Adding complex features that span multiple concerns
- Any task requiring coordination between multiple agents
Never does:
- Implement code directly (has no edit tools)
- Make architectural decisions without delegation
- Tell agents HOW to do their work (only WHAT)
Core Workflow:
User Request
↓
Orchestrator analyzes scope
↓
Delegates to: Researcher → Planner → Coder/Designer → Verifier
↓
Monitors progress, handles gaps, reports completion
The investigator. Researches technologies, maps codebases, verifies implementation approaches. Context7-first with explicit source verification.
- Model: GPT-5.2 (copilot)
- Tools:
vscode,execute,read,context7/*,edit,search,web,memory - Purpose: Technology investigation, codebase analysis, and implementation research
Operating Modes:
- Project mode — New projects: researches domain, tech stack, architecture patterns, pitfalls
- Phase mode — Research implementation details for a specific phase
- Codebase mode — Maps existing codebases (stack, architecture, conventions, concerns)
- Synthesize mode — Consolidates multiple research outputs into unified summary
Source Hierarchy (strict priority order):
- Context7 (
#context7) — HIGH confidence — Always try first for library/framework docs - Official docs (web) — HIGH confidence — When Context7 lacks detail
- Web search (web) — MEDIUM confidence — Ecosystem discovery, comparisons
- Training data — LOW confidence — Only when above fail, flagged as unverified
Key Features:
- Every finding includes confidence level and source citation
- Negative claims ("X doesn't support Y") require extra verification
- Outputs to
.planning/research/or.planning/phases/N/RESEARCH.md - Never implements — research only
Typical Output Files:
SUMMARY.md— Executive summary with recommendationsSTACK.md— Technology choices with rationaleFEATURES.md— Feature analysis with standard approachesARCHITECTURE.md— Recommended patternsPITFALLS.md— Known issues and mitigation strategies
The architect. Creates roadmaps, phase plans, and validates completeness. Plans are executable prompts — describes WHAT, not HOW.
- Model: GPT-5.2 (copilot)
- Tools:
vscode,execute,read,context7/*,edit,search,web,memory,todo - Purpose: Strategic planning and task breakdown with goal-backward validation
Operating Modes:
- Roadmap mode — Creates phase breakdown, requirement mapping, success criteria
- Plan mode — Task-level planning for specific phases (2-3 tasks per plan)
- Validate mode — Verifies plans will achieve goals across 6 dimensions
- Gaps mode — Creates fix plans from verification failures
- Revise mode — Updates plans based on validation issues
Core Philosophy:
- Plans are prompts — Each executable by one agent in one session
- WHAT not HOW — Describes outcomes and constraints, not implementation
- Goal-backward — Derives what must exist from what must be true
- Anti-enterprise — If it needs a meeting to understand, it's too complex
- Research first — Uses
#context7before making technical assumptions
Quality Control:
- Targets 2-3 tasks per plan (5 max before splitting)
- Keeps plans under 50% of executing agent's context budget
- 6-dimensional validation: requirements coverage, task completeness, dependencies, key links, scope, must-haves
Task Anatomy:
Every task has files, action, verify, done — fully specified and testable
Outputs:
ROADMAP.md— Phase breakdown with success criteriaREQUIREMENTS.md— Traceable requirements with REQ-IDsSTATE.md— Project state trackingPLAN.mdfiles — Executable task plans (one per task group)
The implementer. Writes production-quality code following mandatory principles. Executes plans atomically with per-task commits.
- Model: Claude Opus 4.6 (copilot)
- Tools:
vscode,execute,read,context7/*,github/*,edit,search,web,memory,todo - Purpose: Code implementation with strict quality standards and commit discipline
Mandatory Coding Principles:
- Structure — Consistent layout, feature-based grouping, shared structure first
- Architecture — Flat and explicit, no premature abstraction
- Functions — Linear control flow, single purpose, prefer pure
- Naming & Comments — Descriptive names, comments explain WHY not WHAT
- Logging & Errors — Structured logging, explicit error handling
- Regenerability — Files rewritable from interface contracts
- Platform Use — Use conventions directly, don't wrap unnecessarily
- Modifications — Match existing patterns exactly
- Quality — Deterministic, testable, fail loud and early
Execution Model:
- Loads
STATE.mdandPLAN.md - Executes tasks sequentially
- Verifies each task with specified command
- Commits after each successful task (conventional commits)
- Stops at checkpoints for human input
- Creates
SUMMARY.mdwhen complete
Deviation Handling (priority order):
- Rule 4 (highest): STOP for architecture changes → decision checkpoint
- Rule 1: Auto-fix bugs (syntax, logic, types, security) → document in summary
- Rule 2: Auto-add critical pieces (validation, error handling, auth) → document
- Rule 3: Auto-fix blockers (dependencies, imports) → document
Commit Protocol:
- One task, one commit (never batch)
- Never
git add .— stage files individually - Conventional commit types:
feat,fix,test,refactor,perf,docs,style,chore
TDD Support: When detected, uses RED → GREEN → REFACTOR structure with separate commits per phase
The UX advocate. Handles all UI/UX design with uncompromising focus on usability, accessibility, and aesthetics.
- Model: Gemini 3 Pro (Preview) (copilot)
- Tools:
vscode,execute,read,context7/*,edit,search,web,memory,todo - Purpose: UI/UX implementation prioritizing user experience over technical convenience
Priority Order (strictly enforced):
- Usability — Can users accomplish their goal without thinking?
- Accessibility — Can everyone use it, regardless of ability?
- Aesthetics — Does it look and feel polished?
Core Principles:
- Less is more — Remove until removing anything else breaks it
- Consistency — Reuse existing components before creating new ones
- Feedback — Every user action gets visible response
- Hierarchy — Most important = most visible
- Whitespace — Give elements room to breathe
- Motion — Animate with purpose, never decoration
Key Characteristics:
- Pushes back on technical constraints that harm UX
- Implements complete working code (not mockups)
- Tests responsiveness across breakpoints
- Ensures WCAG 2.1 AA compliance minimum
- Reads
.planning/phases/N/RESEARCH.mdfor design constraints - Follows existing design language (never introduces new one)
Context Awareness:
- Checks
CONVENTIONS.mdfor existing design patterns - Consults
#context7for component library docs - Researches existing design systems before creating new components
The quality gatekeeper. Goal-backward verification that work achieved its goal, not just that tasks were completed.
- Model: Claude Sonnet 4.5 (copilot)
- Tools:
vscode,execute,read,edit,search,memory - Purpose: Independent verification with systematic gap detection
Core Principle: Task completion ≠ Goal achievement. Files can exist without being functional. Functions can be exported without being imported. Routes can be defined without being reachable.
Operating Modes:
- Phase mode — Verifies phase implementation against success criteria
- Integration mode — Verifies cross-phase wiring and end-to-end flows
- Re-verify mode — Re-checks after gap closure
10-Step Phase Verification:
- Check for previous verification (re-verification handling)
- Load context (roadmap, requirements, state)
- Establish must-haves (observable truths, artifacts, wiring)
- Verify observable truths (independently test each)
- Verify artifacts (3 levels: existence → substance → wired)
- Verify key links (component→API, API→DB, form→handler, state→render)
- Check requirements coverage
- Scan for anti-patterns (TODOs, placeholders, empty implementations)
- Identify human verification needs
- Structure gap output in YAML
3-Level Artifact Verification:
- Level 1: Existence — File exists?
- Level 2: Substance — Real code, not stub? (line count thresholds)
- Level 3: Wired — Actually imported and used elsewhere?
Integration Verification:
- Build export/import map across phases
- Verify export usage (connected, imported-not-used, orphaned)
- Verify API coverage (defined routes vs called routes)
- Verify auth protection (which routes protected?)
- Verify end-to-end flows (auth, data, forms)
- Compile integration report
Verification Statuses:
- PASSED — All checks satisfied
- GAPS_FOUND — Failures documented with YAML frontmatter
- HUMAN_NEEDED — Programmatic checks passed, manual verification required
Gap Structure:
- Type: artifact / key_link / truth / requirement
- Severity: blocker / warning / info
- Evidence: bash commands showing the gap
- Issue: precise description
Critical Rule: Does NOT trust SUMMARY.md — verifies everything independently with bash commands
The scientific investigator. Finds and fixes bugs using hypothesis testing with persistent debug files and cognitive bias mitigation.
- Model: Claude Opus 4.6 (copilot)
- Tools:
vscode,execute,read,edit,search,web,memory,context7/* - Purpose: Systematic debugging with scientific methodology
Philosophy:
- User = reporter, you = investigator — Symptoms ≠ root causes
- Your own code is harder to debug — Watch for confirmation bias
- Systematic over heroic — Methodical elimination beats inspired guessing
Operating Modes:
- find_and_fix (default) — Find root cause AND implement fix
- find_root_cause_only — Find and document, don't fix
Cognitive Bias Guards:
| Bias | Trap | Antidote |
|---|---|---|
| Confirmation | Looking only for supporting evidence | Actively try to DISPROVE hypothesis |
| Anchoring | Fixating on first clue | Generate ≥2 hypotheses before testing |
| Availability | Blaming most recent change | Check git log but don't assume recent=guilty |
| Sunk Cost | Sticking with wrong theory | 3-test limit per hypothesis, then pivot |
Debug File Protocol:
Every session gets persistent .planning/debug/BUG-[timestamp].md with:
- Symptoms (IMMUTABLE) — Original report, never edited
- Current Focus (OVERWRITE) — Current hypothesis being tested
- Eliminated Hypotheses (APPEND-ONLY) — Failed theories stay for reference
- Evidence Log (APPEND-ONLY) — All observations preserved
- Resolution (OVERWRITE) — Root cause and fix when found
Investigation Techniques:
- Binary Search — Narrow problem space by halving
- Rubber Duck — Explain code path, find mismatch
- Minimal Reproduction — Strip until only bug remains
- Working Backwards — Trace wrong output to source
- Differential — Compare working vs broken
- Observability First — Strategic logging before hypothesizing
- Comment Out Everything — When all else fails
- Git Bisect — When it used to work
Hypothesis Testing Protocol:
- Form ≥2 hypotheses
- Rank by testability (not likelihood)
- For each: Predict → Design test → Execute → Evaluate
- 3-test limit — if unresolved, refine or pivot
Verification Requirements: Fix is verified when ALL true:
- Original symptom gone
- Fix addresses root cause (not symptom)
- No new failures introduced
- Works consistently (not just once)
- Related functionality intact
When to Restart:
- 3+ hypotheses tested with no progress
- Fixes create new bugs
- Can't explain behavior theoretically
- Intermittent and can't reproduce reliably
- Working >30 minutes on same bug
User: "Build a recipe sharing app"
↓
┌─────────────────────────────────────────────────────┐
│ ORCHESTRATOR: Routes request, manages lifecycle │
└─────────────────────────────────────────────────────┘
│
├─► RESEARCH Phase (Steps 1-2)
│ │
│ ├─► Researcher (project mode)
│ │ → .planning/research/STACK.md, FEATURES.md, ARCHITECTURE.md, PITFALLS.md
│ │
│ └─► Researcher (synthesize mode)
│ → .planning/research/SUMMARY.md
│
├─► ROADMAP Phase (Step 3)
│ │
│ └─► Planner (roadmap mode)
│ → .planning/ROADMAP.md, REQUIREMENTS.md, STATE.md
│ → Shows user roadmap, waits for approval
│
├─► PER-PHASE Loop (Steps 4-8, repeated for each phase)
│ │
│ ├─► Researcher (phase mode)
│ │ → .planning/phases/N/RESEARCH.md
│ │
│ ├─► Planner (plan mode)
│ │ → .planning/phases/N/PLAN.md
│ │
│ ├─► Planner (validate mode)
│ │ → Pass/fail with issues
│ │ → If issues: Planner (revise mode) → re-validate
│ │
│ ├─► Coder + Designer (parallel if non-overlapping files)
│ │ → Code implementation with per-task commits
│ │ → .planning/phases/N/SUMMARY.md
│ │
│ ├─► Verifier (phase mode)
│ │ → .planning/phases/N/VERIFICATION.md
│ │ → If gaps: Gap-closure loop (max 3 iterations)
│ │
│ └─► If gaps persist after 3 loops: Report to user
│
├─► INTEGRATION Phase (Step 9)
│ │
│ └─► Verifier (integration mode)
│ → .planning/INTEGRATION.md
│ → Checks cross-phase wiring, end-to-end flows
│
└─► COMPLETION (Step 10)
│
└─► Orchestrator compiles final report
→ What was built, decisions, verification status, how to run
Bug Fixing:
User: "Login is broken"
↓
Orchestrator → Debugger (find_and_fix)
→ Creates .planning/debug/BUG-[timestamp].md
→ Hypothesis testing with bias guards
→ Implements fix with verification
→ Updates debug file with root cause
Quick Code Change:
User: "Add dark mode toggle"
↓
Orchestrator → Coder (if logic)
or → Designer (if UI-focused)
→ Direct implementation
→ Conventional commit
Existing Codebase Analysis:
User: "Analyze this project"
↓
Orchestrator → Researcher (codebase mode)
→ .planning/codebase/STACK.md
→ .planning/codebase/ARCHITECTURE.md
→ .planning/codebase/CONVENTIONS.md
→ .planning/codebase/CONCERNS.md
Run in parallel when:
- Tasks touch different files with no overlap
- Tasks are in different domains (styling vs logic)
- Tasks have no data dependencies
Run sequentially when:
- Task B needs output from Task A
- Tasks might modify the same file
- Design must be approved before implementation
File Conflict Prevention:
- Orchestrator explicitly scopes each agent to specific files
- Uses component boundaries for UI work
- Splits into sub-phases if overlap unavoidable
All agents write to .planning/ for structured, traceable artifact management:
.planning/
├── REQUIREMENTS.md # Requirements with REQ-IDs (Planner creates)
├── ROADMAP.md # Phase breakdown (Planner creates)
├── STATE.md # Project state tracking (Planner initializes, Coder updates)
├── INTEGRATION.md # Cross-phase verification (Verifier creates)
│
├── research/ # Research outputs (Researcher creates)
│ ├── SUMMARY.md # Consolidated research (synthesize mode)
│ ├── STACK.md # Technology choices
│ ├── FEATURES.md # Feature analysis
│ ├── ARCHITECTURE.md # Architecture patterns
│ └── PITFALLS.md # Known pitfalls
│
├── codebase/ # Codebase analysis (Researcher codebase mode)
│ ├── STACK.md # Current stack inventory
│ ├── ARCHITECTURE.md # Current architecture
│ ├── STRUCTURE.md # Directory structure
│ ├── CONVENTIONS.md # Code conventions
│ ├── TESTING.md # Testing setup
│ ├── INTEGRATIONS.md # External integrations
│ └── CONCERNS.md # Tech debt and risks
│
├── phases/
│ ├── 1/
│ │ ├── RESEARCH.md # Phase research (Researcher phase mode)
│ │ ├── PLAN.md # Task plans (Planner plan mode)
│ │ ├── SUMMARY.md # Execution summary (Coder)
│ │ └── VERIFICATION.md # Phase verification (Verifier phase mode)
│ ├── 2/
│ │ └── ...
│ └── N/
│
└── debug/ # Debug session files (Debugger creates)
├── BUG-[timestamp].md
└── ...
Frontmatter YAML: Most planning artifacts use YAML frontmatter for structured metadata:
- Plans:
phase,plan,type,wave,dependencies,must_haves - Verifications:
phase,status,score,gaps - Debug files:
bug_id,status,created,updated,symptoms,root_cause,fix
Traceability:
- Requirements have REQ-IDs
- Plans reference requirements
- Verifications check requirement coverage
- Summaries list commits
- Debug files are append-only evidence logs
Context References:
Plans use @ notation to reference other artifacts:
## Context
@.planning/phases/1/RESEARCH.md
@.planning/codebase/CONVENTIONS.md-
Context7 MCP (highly recommended)
- Install: Context7 MCP Extension
- Provides up-to-date library/framework documentation
- Used by Researcher, Planner, Coder, Designer, Debugger
-
Git (required for Coder)
- Per-task commits with conventional commit format
- Repository must be initialized before Coder runs
-
VS Code or VS Code Insiders
- GitHub Copilot subscription active
- Agent support enabled (generally available in Copilot)
- GitHub MCP — For GitHub integration (Coder uses if available)
- Memory — Experimental in VS Code Insiders (Orchestrator uses if available)
- Install the agents you need using the install badges above
- Initialize
.planning/directory in your project (agents will create subdirectories as needed) - Initialize git if not already:
git init - Start with Orchestrator for complex work or invoke specialized agents directly for focused tasks
Start a new project:
@orchestrator Build a recipe sharing app with user authentication
Add a feature to existing project:
@orchestrator Add real-time notifications using WebSockets
Analyze existing codebase:
@researcher Analyze this codebase — map the tech stack and architecture
Create implementation plan:
@planner Create a plan for the user authentication phase
Implement a specific feature:
@coder Execute the plan in .planning/phases/1/PLAN.md
Fix a bug:
@debugger Login returns 500 error when password is incorrect
Verify phase completion:
@verifier Verify Phase 1 implementation against success criteria
Design UI:
@designer Create a dark mode toggle component with smooth transitions
The memory tool is experimental in VS Code Insiders. Orchestrator uses it if available but gracefully degrades if not present.
All agents use relative paths within .planning/. Never hardcode absolute paths in plans or artifacts — they break across different agent contexts.
Coder never uses git add . — always stages files individually. This ensures atomic, reviewable commits per task.
Verifier does NOT trust SUMMARY.md claims. It independently verifies everything with bash commands. This catches "tasks completed but goals not achieved" scenarios.
Planner keeps plans under 50% of Coder's context budget (target: 2-3 tasks per plan, 5 max). This maintains execution quality.
Debugger enforces a 3-test limit per hypothesis. If 3 tests don't resolve it, the hypothesis is too vague — refine or pivot.
Designer prioritizes UX over technical convenience. If a technical constraint harms user experience, Designer will push back. This is intentional.
Orchestrator explicitly scopes agents to specific files when delegating parallel work to prevent merge conflicts.
Plans derive must_haves goal-backward from phase success criteria. Verifier checks these independently. This ensures planning → execution → verification alignment.
Orchestrator automatically determines routing, but you can specify:
@orchestrator Research options for real-time features, then create a plan (don't implement yet)
This triggers Steps 1-2 (research) and stops before execution.
When Verifier finds gaps after phase execution:
- Verifier writes gaps to
VERIFICATION.mdfrontmatter (structured YAML) - Orchestrator invokes Planner (gaps mode) to create fix plans
- Orchestrator invokes Coder to execute fixes
- Orchestrator invokes Verifier (re-verify mode)
- Max 3 iterations — if gaps persist, escalates to user
If Planner detects TDD setup or user mentions "test-first," plans use RED→GREEN→REFACTOR structure:
- RED: Write failing test → commit:
test: add failing test for [feature] - GREEN: Implement minimum code → commit:
feat: implement [feature] - REFACTOR: Clean up → commit:
refactor: clean up [feature](if changes made)
STATE.md tracks project position. Orchestrator reads it to determine resume point:
- Research exists but no roadmap → resume at Step 3
- Roadmap exists but phase not started → resume at Step 4
- Phase plans exist but not validated → resume at Step 6
- Phase execution incomplete → resume at Step 7
- Phase complete but not verified → resume at Step 8
Agents return structured checkpoints for:
- human-verify — Visual/manual checks (90% of checkpoints)
- decision — User must choose between options (9%)
- human-action — User must perform action (1%)
- auth-gate — Authentication required
Human provides input, agent resumes from checkpoint task.
This agent system is built on these principles:
- Solo developer workflow — No enterprise ceremony, no unnecessary meetings
- Goal-backward everything — Start from desired end state, derive what must exist
- Verification is not optional — Task completion ≠ goal achievement
- Context7 first — Training data is stale, always verify against current docs
- WHAT not HOW — Agents decide implementation, plans describe outcomes
- Fail loud and early — Better to stop and ask than proceed with wrong assumptions
- Traceable artifacts — Every decision, every gap, every commit documented
- Scientific debugging — Hypothesis testing with bias guards, not heroic guessing
- Regenerable code — Any file rewritable from its interface contract
- Atomic commits — One task, one commit, fully reviewable
Found an issue or want to improve these agents? Contributions welcome!
[Specify your license here]
Built with ❤️ for developers who want AI agents that work like a senior engineering team.
remember this if you like to play with agent flows, and for now i think is only in Insider it will work

