Microsoft AI Dev Days | 15-20 minutes + demos
"I'm going to show you how to build AI agents that can escalate to smarter models when they're stuck. But first, I need to tell you why you probably shouldn't use most of what I'm about to show you in production. Yet."
The Tension: AI agents are incredibly powerful AND frequently unreliable. Both are true. Let's be honest about both.
Software has always been instructions + tools. What's changed:
- Instructions used to be for machines (precise, deterministic)
- Now instructions are for AI (fuzzy, probabilistic)
- Old skill: Learning how to use tools
- New skill: Learning how to write instructions for AI to use tools
This is genuinely exciting. It's also:
- Unpredictable
- Expensive at scale
- Sometimes hilariously wrong
- A security surface area we don't fully understand
# A markdown file IS an agent
ma deploy-preview.copilot.md- Version controlled instructions
- Human readable, AI executable
- Portable across teams
- Auditable - you can review what the AI was told to do
---
command: gh-copilot
---
List the last 5 commits in this repo with their authors.Show: deterministic task, bounded scope, verifiable output.
1. High-Stakes Decisions
- Financial transactions
- Security-critical operations
- Anything with legal implications
- User data modifications without human confirmation
2. When You Need Determinism
- Build scripts that must work every time
- CI/CD pipelines (unless with heavy guardrails)
- Anything where "usually works" isn't good enough
3. When Cost Matters
- Token costs add up fast
- Rate limits are real
- A runaway agent loop can burn through quotas
4. When You Can't Verify Output
- If you can't tell if the answer is right, don't automate it
- "Looks plausible" is not the same as "correct"
Human Intent → AI Draft → Human Verification → Execution
Not:
Human Intent → AI Execution → Hope
- Any user input that reaches the AI is a potential attack vector
- Markdown files can contain malicious instructions
!command`` execution in imports is powerful AND dangerous
- Sandboxing: Run agents in containers with limited permissions
- Allowlists: Restrict which commands agents can invoke
- Audit Logging: Log every command an agent runs (ma does this!)
- Review Before Commit: Never let agents push directly to main
Show the logs at ~/.markdown-agent/logs/<agent>/ - prove you can trace what happened.
Instead of one powerful (expensive) agent, use a chain:
- Fast/cheap model tries first (Copilot CLI, GPT-4o-mini)
- If stuck, escalate to smarter model (Claude Opus, GPT-4)
- If still stuck, escalate to human
- Most tasks are simple - don't pay for Opus to run
git status - Expensive models only used when needed
- Built-in circuit breaker: escalation has limits
---
command: gh-copilot
on-failure: escalate.claude.md
max-escalations: 2
---
Analyze why tests are failing and suggest a fix.Show:
- Copilot tries, gets confused by complex issue
- Automatically escalates to Claude
- Claude provides better analysis
- Human still reviews before applying
Escalation works when you can detect failure. Many AI failures look like success (confident wrong answers).
- Novel reasoning - they remix, they don't truly innovate
- Knowing what they don't know - confidence != correctness
- Long-term consistency - context windows are real limits
- Understanding consequences - they don't feel the pain of their mistakes
- Multi-step planning
- Tool use reliability
- Recovering from errors gracefully
Use agents for augmentation, not replacement. The human in the loop isn't a bug, it's a feature.
ma analyze.copilot.md --dry-run
# Review output
ma analyze.copilot.md --apply- Time limits
- Token limits
- Allowed action lists
- Required confirmation for destructive operations
Agent suggests, human approves. Every time.
Show an agent that:
- Proposes a git commit
- Shows the diff
- Requires
--confirmto actually commit - Logs the decision either way
- That this will replace developers
- That agents are production-ready for everything
- That you should automate your entire workflow
- This changes how we think about instructions
- There's a responsible path forward
- The skeptics who learn this will build better systems than the enthusiasts who ignore the risks
"Be the developer who understands the limitations. You'll be more valuable than the one who only knows the hype."
markdown-agent(ma): github.com/johnlindquist/agents- GitHub Copilot CLI:
gh extension install github/gh-copilot - This outline: [gist link]
| Demo | Risk Level | What Could Go Wrong | Mitigation |
|---|---|---|---|
| Simple agent | Low | Copilot API down | Have backup recording |
| Audit trail | Low | Logs empty | Pre-run agents before talk |
| Escalation | Medium | Escalation fails weirdly | Have fallback slide |
| Guardrails | Low | Demo works too well | Show it blocking something |
| Section | Time | Running Total |
|---|---|---|
| Hook | 1 min | 1 min |
| Paradigm Shift | 3 min | 4 min |
| markdown-agent | 3 min | 7 min |
| When NOT to Use | 4 min | 11 min |
| Security | 2 min | 13 min |
| Escalation | 4 min | 17 min |
| AI Limitations | 2 min | 19 min |
| Deployment Patterns | 2 min | 21 min |
| Closing | 1 min | 22 min |
Total: ~22 minutes (trim demos if running long)