Source: This is a summary of a MIT Missing Semester lecture on agentic coding. Watch the original video
This lecture covers coding agents — AI models wrapped in an "agent harness" that can autonomously read/write files and execute shell commands to complete programming tasks end-to-end. The lecturer demonstrates Claude Code live, explains how LLMs and agent harnesses work under the hood, walks through key use cases, and covers advanced features like parallel agents, context management, and sub-agents.
The lecturer takes a simple dl.py script and asks Claude Code (in natural language) to turn it into a proper CLI program with argparse, type annotations, and mypy verification. The agent reads the file, rewrites it, runs the type checker, and confirms everything passes — all autonomously. Key point: the lecturer configures the agent to auto-approve file edits but manually approves shell commands to maintain safety. Git tracking makes it easy to review and revert any changes.
A language model (LLM) is fundamentally a conditional probability distribution: given input tokens (prompt X), it samples output tokens (completion Y). Multi-turn chat works by passing the entire conversation history as the prompt each time. An agent harness (a.k.a. scaffolding) wraps the LLM in a loop: it calls the LLM, interprets tool-call outputs (e.g., "run mypy dl.py"), executes them on your machine, feeds the results back as the next prompt, and repeats until the model emits a final text response to the user.
Hosted models (required for strong performance) send your code to the cloud — check provider terms if privacy matters. A simple task costs roughly $0.19. The lecturer recommends a ~$20/month subscription plan for students. You can control costs by choosing smaller/cheaper models for simpler tasks. Local models are an option but are significantly weaker for complex tasks today.
Give the agent a high-level description ("make it look retro") or a detailed PRD (product requirements document) in Markdown. The agent iterates, and you can give follow-up instructions to refine results. For complex features, longer, more specific prompts yield better results. The course website's sidenote feature and lecture refactoring were built entirely via Claude Code.
Write (or ask the agent to write) a failing unit test that reproduces the bug, then tell the agent: "there's a bug, run this command to reproduce it, fix it." The agent runs the test, reads relevant source files, applies a fix, re-runs the test to confirm it passes, then runs the full test suite to catch regressions. It can also create a well-styled git commit by reading git log to match the repo's commit message conventions.
Agents are strong at traversing codebases and reasoning about code deltas. Useful for: catching subtle semantic differences after performance refactors, reviewing PRs for code quality/correctness, and navigating unfamiliar codebases (e.g., for a UROP or open-source contribution). Provide specific review criteria for more targeted feedback.
Instead of memorizing obscure flags, describe what you want in plain English ("find all Python files with renaming imports, ignore the lib directory"). The agent generates and runs the correct shell command, then summarizes results. Useful as an everyday productivity tool.
Save frequently-used prompt templates (e.g., project-specific code review instructions) so you don't copy-paste from a text file every session. Most agent harnesses have built-in support for named, reusable prompts.
Since agents can take 20–30 minutes on complex tasks, run multiple agents simultaneously on different tasks. Use git worktree to create isolated checkouts of your repo on separate branches, preventing conflicts between parallel agents. Each agent commits to its own branch; merge when done.
A standardized protocol for connecting agent harnesses to external tools (e.g., Notion, GitHub). Lets you tell Claude to "read the Notion doc, improve the implementation plan, then implement the feature in the codebase" — all in one instruction.
LLMs have a fixed context window; stuffing it degrades performance. Key techniques:
- Clear context between unrelated tasks (restart the session)
- Rewind the conversation to undo a mistaken direction without keeping that noise in context
- Compact (
/compactin Claude Code) uses an LLM to summarize long conversation history into a shorter prefix, enabling indefinitely long sessions
A proposed standard for libraries to publish a plain-text, LLM-optimized documentation file (vs. noisy HTML). If a library you want to use postdates the model's training cutoff, point the agent at its llms.txt URL. The agent fetches it, loads the relevant docs into context, and can then write correct code against the new API. More compact than raw HTML, preserving context budget.
A file placed in your repo that is automatically loaded into the agent's context on every startup. Use it to document: how to run tests, the type checker command, code style guidelines, and project conventions — so you don't have to re-explain them every session.
An evolution of agents.md: instead of dumping all documentation into context upfront (wasting context budget), agents.md contains only a table of contents pointing to separate skill files. The agent loads individual skill files on-demand via tool calls, keeping the active context lean.
A parent agent can spawn child agents for subtasks, keeping the parent's context clean. Example: the web-fetch tool can be implemented as a sub-agent that fetches a URL, summarizes only the relevant parts, and returns a compact result — rather than dumping full HTML into the parent's context. Sub-agents can also run in parallel (e.g., simultaneously run a code-review sub-agent and a type-check sub-agent after implementing a feature).
LLMs are probabilistic, not intelligent — they make mistakes. Common failure modes include: subtly wrong code that looks correct at first glance, gaslighting (insisting incorrect output is right), going down rabbit holes, and getting stuck in debugging spirals that make code progressively worse. Always review AI-generated code for correctness and security. For critical logic, writing it yourself may be faster than verifying AI output.
- Coding agents = LLM + agent harness: the harness runs the LLM in a loop, dispatches tool calls (file read/write, bash, web fetch), and feeds results back as context
- Safety stance: auto-approve file edits, manually approve shell commands; always work in a git repo so you can revert
- Feedback loops are powerful: give the agent a failing test and let it iterate until it passes — this is TDD for agents
- Context is the main resource to manage: clear it between unrelated tasks, compact it for long sessions, use
agents.md/skills to load only what's needed - Parallelism multiplies your throughput: git worktrees + multiple agents = working on N tasks simultaneously
- AI makes mistakes: review output critically, especially for subtle logic bugs and security issues; don't fully trust code that merely looks correct
- Try Claude Code (or Aider, OpenCode) on a real project — start with a simple refactor or bug fix with a failing test
- Configure your approval settings: auto-approve file edits, require manual approval for bash commands
- Set up a
CLAUDE.md(or equivalent) in your main repos with: how to run tests, type checker command, and key project conventions - Practice the TDD workflow: write a failing test → hand it to the agent → let it fix the bug and run the full suite
- Use
git worktreewhen you want to run parallel agents on the same codebase - Explore
/compactafter a long session to see how context compression works - Check policies before using these tools on coursework — rules vary widely by class