Microsoft AI Dev Days | 15-20 minutes
"The best assistant isn't the one who does everything silently. It's the one who knows exactly when to tap you on the shoulder."
Opening Question: Have you ever had a junior developer who just... did things? Deployed to production without asking? Made architectural decisions while you were at lunch?
That's what most AI scripts do today. They either:
- Ask permission for EVERYTHING (death by a thousand confirmations)
- Ask permission for NOTHING (chaos mode)
Today's Promise: I'll show you how to build AI workflows that know the perfect moment to involve you.
Full Autonomy ←————————————→ Full Supervision
"YOLO" "Are you sure?"
"Are you REALLY sure?"
Neither extreme works:
- Full autonomy: Catastrophic failures
- Full supervision: Automation theater
Not "should AI be autonomous?" but "autonomous about WHAT?"
Key Insight: Different operations deserve different trust levels.
| Trust Level | Example Operations | Human Involvement |
|---|---|---|
| High | Read files, search code, analyze | None |
| Medium | Write tests, format code | Summary only |
| Low | Delete files, modify configs | Explicit approval |
| Critical | Deploy, publish, external APIs | Interactive session |
Concept: AI outputs structured markers that outer scripts detect and act on.
<!-- ESCALATE:HUMAN_DECISION -->
I found 3 possible approaches to implement this feature:
1. Add a new API endpoint (2 hours, breaking change)
2. Extend existing endpoint (4 hours, backward compatible)
3. Use webhooks (1 hour, requires client changes)
Each has tradeoffs I can't evaluate without knowing your priorities.
<!-- /ESCALATE -->#!/bin/bash
# Fast agent runs first
result=$(ma quick-analysis.copilot.md --file "$1")
# Check for escalation markers
if echo "$result" | grep -q "ESCALATE:HUMAN_DECISION"; then
echo "Agent needs your input:"
echo "$result" | sed -n '/ESCALATE/,/\/ESCALATE/p'
read -p "Your choice (1/2/3): " choice
# Continue with enriched context
ma continue-with-choice.claude.md --choice "$choice" --context "$result"
fi| Marker | Meaning | Script Response |
|---|---|---|
ESCALATE:CONFIDENCE_LOW |
Model uncertain | Escalate to stronger model |
ESCALATE:HUMAN_DECISION |
Business logic choice | Interactive prompt |
ESCALATE:PERMISSION_NEEDED |
Dangerous operation | Require explicit approval |
ESCALATE:CONTEXT_MISSING |
Need more info | Gather additional context |
The Insight: Define what's allowed in the markdown frontmatter itself.
---
command: copilot
permissions:
allow:
- read:src/**
- write:src/tests/**
deny:
- write:src/config/**
- delete:*
escalate:
- write:src/api/**
- modify:package.json
---
Analyze the codebase and suggest improvements.
Implement any test improvements directly.
For API changes, describe what you'd do and wait for approval.1. New Script → Maximum Restrictions
↓
2. Proven Script → Relaxed Read Access
↓
3. Trusted Script → Write to Safe Zones
↓
4. Verified Script → Escalation-only for Danger Zones
# Script tries something outside its permissions
# Instead of failing silently or crashing:
echo "PERMISSION_BOUNDARY: write:src/config/database.ts"
echo "Script wants to modify database configuration."
echo "Reason: 'Optimize connection pool settings'"
echo ""
echo "Options:"
echo " [a]llow once [t]rust for session [d]eny [v]iew change"Before any AI action, the script asks itself:
- Do I have permission? (Check boundaries)
- Am I confident? (Check certainty markers)
- Is this reversible? (Check operation type)
---
command: copilot
confidence-threshold: 0.7
escalate-to: claude
---
Analyze this error log and suggest fixes.
If you're less than 70% confident, output ESCALATE:CONFIDENCE_LOW.Outer script:
result=$(ma analyze-error.copilot.md --log "$1")
if echo "$result" | grep -q "CONFIDENCE_LOW"; then
echo "Copilot uncertain. Escalating to Claude..."
ma analyze-error.claude.md --log "$1" --prior-analysis "$result"
else
echo "$result"
fiInterrupt for decisions, not for status updates.
Bad Interruption:
"I'm now analyzing file 3 of 47. Continue? [y/n]"
Good Interruption:
"I found a security vulnerability in auth.ts (line 42).
This exposes user tokens. Fix options:
1. Immediate patch (may break sessions)
2. Graceful migration (4-hour implementation)
Which approach? [1/2]"
---
command: copilot
model: gpt-4o
---
Review this PR diff:
@!`git diff main...HEAD`
Output your review in this format:
- APPROVE: if changes are safe and well-tested
- COMMENT: for style/minor issues (list them)
- ESCALATE:SECURITY if you see potential security issues
- ESCALATE:ARCHITECTURE if changes affect system design
- REQUEST_CHANGES: for bugs or missing tests (list them)The orchestrating script:
#!/bin/bash
review=$(ma pr-review.copilot.md)
case "$review" in
*"ESCALATE:SECURITY"*)
echo "Security concern detected. Getting senior review..."
ma security-review.claude.md --context "$review"
;;
*"ESCALATE:ARCHITECTURE"*)
echo "Architecture change detected."
echo "$review"
read -p "Proceed with detailed review? [y/n]: " proceed
[[ $proceed == "y" ]] && ma architecture-review.claude.md
;;
*"APPROVE"*)
gh pr review --approve
;;
*)
echo "$review"
;;
esac# Every successful operation builds trust
echo "$(date): copilot:pr-review:SUCCESS" >> ~/.ma/trust-journal
# Track escalation accuracy
# Did the human agree with the escalation?
echo "$(date): copilot:ESCALATE:SECURITY:VALIDATED" >> ~/.ma/trust-journalWeek 1: Script asks about everything Week 4: Script handles routine, escalates edge cases Week 12: Script has earned autonomy for its domain
The Goal: Earned autonomy, not granted autonomy.
- Design for the interrupt - The interrupt IS the interface
- Trust is granular - Not "trust AI" but "trust THIS AI for THIS operation"
- Scripts should be honest - Better to escalate than guess
Future AI workflows aren't about removing humans.
They're about involving humans at exactly the right moment,
with exactly the right context,
for exactly the right decision.
"The best AI assistant is like a great executive assistant: handles the routine brilliantly, and knows EXACTLY which decisions only you can make."
| Demo | Time | Key Concept |
|---|---|---|
| Escalation Markers | 2 min | ESCALATE:* pattern |
| Permission Boundaries | 2 min | Frontmatter trust levels |
| Confidence Cascade | 2 min | Fast agent -> smart agent |
| PR Review Loop | 3 min | Real-world integration |
Total Demo Time: ~9 minutes woven throughout
markdown-agent(ma): github.com/johnlindquist/markdown-agent- GitHub Copilot CLI: github.com/github/gh-copilot
- Example patterns: (link to gist collection)
ESCALATE:HUMAN_DECISION - Need human choice
ESCALATE:CONFIDENCE_LOW - Model uncertain
ESCALATE:PERMISSION_NEEDED - Outside boundaries
ESCALATE:CONTEXT_MISSING - Need more info
ESCALATE:SECURITY - Security concern
ESCALATE:ARCHITECTURE - Design decision
permissions:
allow: [pattern...] # Allowed operations
deny: [pattern...] # Blocked operations
escalate: [pattern...] # Require approval- Do I have permission?
- Am I confident?
- Is this reversible?