Skip to content

Instantly share code, notes, and snippets.

@DMontgomery40
Created March 13, 2026 09:27
Show Gist options
  • Select an option

  • Save DMontgomery40/c8ac6352741a5668aadb58b6a39634f2 to your computer and use it in GitHub Desktop.

Select an option

Save DMontgomery40/c8ac6352741a5668aadb58b6a39634f2 to your computer and use it in GitHub Desktop.
Thorough code review loop skill bundle
interface:
display_name: "Thorough Code Review"
short_description: "Deep bug-focused review for any repo"
default_prompt: "Use $thorough-code-review to review this repo or branch thoroughly, focus on concrete bugs, regressions, and fake checkbox tests, and keep going past the usual 3 findings."

Browser And Runtime Proof

Netlify / Remote Truth Rule

If the repo contains netlify.toml and the changed behavior depends on Netlify Functions, redirects, auth/cookies, serverless env, deployed routing, or other platform wiring, localhost is not sufficient proof by default.

  • Require at least one deployed preview or production URL verification path when available.
  • Localhost remains acceptable for pure client-rendering, static UI, or isolated logic checks that do not depend on Netlify runtime behavior.
  • If a deployed URL is required but not discoverable from the repo or environment, say that local verification is incomplete rather than over-claiming confidence.

Browser / Tool Surface Matrix

  • Repo Playwright CLI: default when the repo already has Playwright config or scripts and the goal is durable regression coverage that should live in the repo or CI.
  • Codex native Playwright MCP: default for quick reproduction, DOM inspection, screenshots, and one-off browser smoke checks during investigation.
  • Playwright via Codemode MCP / code execution: use when the browser flow needs programmable orchestration, loops, or combined automation across multiple tool surfaces.
  • Playwright CLI via Codemode: use when the repo's own Playwright command is still the source of truth, but execution is being orchestrated from a Codemode workflow.
  • macOS automation via Codemode: use only for OS-level gaps that Playwright cannot honestly cover, such as native file pickers, permission dialogs, downloads, app switching, or browser profile handling.

Verification Ordering

Use the lightest honest proof surface first:

  1. local smoke for quick reproduction or static UI validation
  2. repo-native regression command for durable coverage
  3. deployed preview or production verification when platform/runtime behavior matters

Do not sign off on a platform-bound bug with only component tests or localhost smoke if the real risk lives at the deployed edge.

Test Expansion And Verification

Generalized Test Requirement

When you fix a bug you found during review, the default expectation is bug repro test + category test, not either/or.

  • Start with the nearest existing behavioral suite and extend it before creating a one-off new file.
  • Prefer generalized coverage shapes such as invariant, matrix, state-transition, property, idempotency, contract, or multi-case integration tests when they fit the stack.
  • If the repo is thin on tests, add the smallest honest test in the repo's existing stack rather than introducing a large new framework.
  • A narrow regression-only test is incomplete when the real failure was really about retries, fallback order, idempotency, cleanup, validation classes, ordering, or another repeatable bug family.

Test Placement Heuristic

  • Extend existing shared suites first, especially when the repo already has domain, state, invariant, integration, or contract coverage nearby.
  • For React or Vite UI logic inside the review surface, prefer the existing state or render suites first.
  • For browser flow regressions, prefer the repo's existing Playwright suite first.
  • For Streamlit or dev-tool state bugs, prefer pytest or the existing app-level harness first.
  • For backend, API, CLI, or job lifecycle bugs, prefer the nearest integration, contract, or state-transition suite rather than a narrow helper-only test.
  • Avoid tests that only pin one literal return value when the real risk is ordering, retries, cleanup, idempotency, partial failure, or cross-step state drift.

Turn-End Verification Gate

For mutating review turns, broader test coverage and verification are part of completion.

  • Before ending the turn, run the narrowest changed-surface tests plus the repo's standard verify, build, or quality-gate command if one exists.
  • If the closest repo AGENTS.md or CLAUDE.md requires a full suite, obey the repo rule instead of this default.
  • Do not treat a fix as done if it only added a hyper-specific regression test where broader category coverage was realistically possible.
  • For review-only turns, still run the narrowest non-mutating validation that proves the finding when feasible.
  • If verification could not be completed, say exactly what was blocked and what remains unverified.
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import json
import subprocess
import sys
from pathlib import Path
TEST_HINTS = (
".test.",
".spec.",
"_test.py",
"test_",
)
VERIFY_SCRIPT_KEYS = ("verify", "build", "lint", "check")
TEST_SCRIPT_KEYS = ("test", "test:e2e", "test:unit", "test:ui", "test:fullstack", "test:mocked")
def run(cmd: list[str], cwd: Path) -> str:
try:
return subprocess.check_output(cmd, cwd=str(cwd), stderr=subprocess.DEVNULL, text=True).strip()
except Exception:
return ""
def git_root(path: Path) -> Path:
root = run(["git", "rev-parse", "--show-toplevel"], path)
return Path(root) if root else path
def load_package_json(path: Path) -> dict:
try:
return json.loads(path.read_text())
except Exception:
return {}
def tracked_tests(repo_root: Path) -> list[str]:
output = run(["git", "ls-files"], repo_root)
if not output:
return []
files = output.splitlines()
return [f for f in files if any(hint in f for hint in TEST_HINTS) or "/tests/" in f or f.startswith("tests/")]
def gather_commands(pkg_path: Path, label: str) -> tuple[list[str], list[str]]:
data = load_package_json(pkg_path)
scripts = data.get("scripts", {})
test_cmds = []
verify_cmds = []
for key, value in scripts.items():
rendered = f"{label}: {key} -> {value}"
if key in TEST_SCRIPT_KEYS or "test" in key:
test_cmds.append(rendered)
elif key in VERIFY_SCRIPT_KEYS or any(token in key for token in VERIFY_SCRIPT_KEYS):
verify_cmds.append(rendered)
return test_cmds, verify_cmds
def pyproject_markers(path: Path) -> list[str]:
if not path.exists():
return []
text = path.read_text(errors="ignore")
markers = []
for token in ("pytest", "streamlit", "playwright", "vitest"):
if token in text:
markers.append(token)
return markers
def main() -> int:
parser = argparse.ArgumentParser(description="Inspect repo review/test surfaces quickly.")
parser.add_argument("repo", nargs="?", default=".", help="Repo path to inspect")
parser.add_argument("--json", action="store_true", help="Emit JSON instead of markdown")
args = parser.parse_args()
start = Path(args.repo).expanduser().resolve()
repo_root = git_root(start)
package_candidates = [
repo_root / "package.json",
repo_root / "frontend" / "package.json",
repo_root / "web" / "package.json",
repo_root / "apps" / "web" / "package.json",
]
package_candidates = [p for p in package_candidates if p.exists()]
test_commands: list[str] = []
verify_commands: list[str] = []
deps: set[str] = set()
for pkg in package_candidates:
label = str(pkg.relative_to(repo_root))
data = load_package_json(pkg)
deps.update(data.get("dependencies", {}).keys())
deps.update(data.get("devDependencies", {}).keys())
tests, verify = gather_commands(pkg, label)
test_commands.extend(tests)
verify_commands.extend(verify)
pyproject = repo_root / "pyproject.toml"
pytest_ini = repo_root / "pytest.ini"
py_markers = pyproject_markers(pyproject)
if pytest_ini.exists() and "pytest" not in py_markers:
py_markers.append("pytest")
has_netlify = (repo_root / "netlify.toml").exists() or (repo_root / "netlify" / "functions").exists()
has_playwright = any((repo_root / name).exists() for name in ("playwright.config.ts", "playwright.config.js", "playwright.config.mjs")) or "@playwright/test" in deps or "playwright" in py_markers
has_vitest = any((repo_root / name).exists() for name in ("vitest.config.ts", "vitest.config.js", "vitest.config.mjs")) or "vitest" in deps
has_jest = any((repo_root / name).exists() for name in ("jest.config.ts", "jest.config.js", "jest.config.mjs")) or "jest" in deps
has_pytest = "pytest" in py_markers
has_streamlit = "streamlit" in py_markers
tracked = tracked_tests(repo_root)
detected_surfaces = []
if has_playwright:
detected_surfaces.append("playwright")
if has_vitest:
detected_surfaces.append("vitest")
if has_jest:
detected_surfaces.append("jest")
if has_pytest:
detected_surfaces.append("pytest")
if has_streamlit:
detected_surfaces.append("streamlit")
notes = []
if has_netlify:
notes.append("Netlify markers detected; deployed preview/prod proof may be required for functions, redirects, auth, or cookie behavior.")
if has_playwright:
notes.append("Repo Playwright CLI is available for durable browser regression coverage.")
if not tracked:
notes.append("Little or no tracked test coverage detected; prefer the thinnest honest addition in the repo's existing stack.")
if not detected_surfaces:
notes.append("No obvious automated test harness detected from common manifests/configs.")
result = {
"repo_root": str(repo_root),
"has_netlify": has_netlify,
"detected_test_surfaces": detected_surfaces,
"test_commands": test_commands,
"verify_commands": verify_commands,
"tracked_test_file_count": len(tracked),
"tracked_test_examples": tracked[:12],
"notes": notes,
}
if args.json:
print(json.dumps(result, indent=2))
return 0
print(f"Repo root: {result['repo_root']}")
print(f"Netlify: {'yes' if has_netlify else 'no'}")
print(f"Detected test surfaces: {', '.join(detected_surfaces) if detected_surfaces else 'none-obvious'}")
print(f"Tracked test files: {len(tracked)}")
if tracked:
print("Tracked test examples:")
for item in tracked[:12]:
print(f"- {item}")
if test_commands:
print("Test commands:")
for item in test_commands:
print(f"- {item}")
if verify_commands:
print("Verify/build commands:")
for item in verify_commands:
print(f"- {item}")
if notes:
print("Notes:")
for item in notes:
print(f"- {item}")
return 0
if __name__ == "__main__":
sys.exit(main())
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SKILL_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
SKILL_NAME="$(basename "$SKILL_DIR")"
CODEX_DIR="${CODEX_SKILL_DIR:-$HOME/.codex/skills/$SKILL_NAME}"
CLAUDE_DIR="${CLAUDE_SKILL_DIR:-$HOME/.claude/skills/$SKILL_NAME}"
usage() {
cat <<EOF
Sync shared review skill files between Codex and Claude skill folders.
Usage:
sync_codex_claude.sh [auto|codex-to-claude|claude-to-codex|status]
Environment overrides:
CODEX_SKILL_DIR default: ~/.codex/skills/$SKILL_NAME
CLAUDE_SKILL_DIR default: ~/.claude/skills/$SKILL_NAME
EOF
}
mtime() {
local path="$1"
if stat -f "%m" "$path" >/dev/null 2>&1; then
stat -f "%m" "$path"
else
stat -c "%Y" "$path"
fi
}
latest_mtime() {
local dir="$1"
local latest=0
while IFS= read -r -d '' file; do
local file_mtime
file_mtime="$(mtime "$file")"
if (( file_mtime > latest )); then
latest="$file_mtime"
fi
done < <(find "$dir" -type f ! -path "*/agents/*" ! -name ".DS_Store" -print0)
echo "$latest"
}
sync_one_way() {
local src="$1"
local dst="$2"
mkdir -p "$dst"
rsync -a --exclude "agents/" --exclude ".DS_Store" "$src/" "$dst/"
}
mode="${1:-auto}"
case "$mode" in
auto|codex-to-claude|claude-to-codex|status) ;;
-h|--help|help)
usage
exit 0
;;
*)
usage >&2
exit 1
;;
esac
mkdir -p "$CODEX_DIR" "$CLAUDE_DIR"
codex_latest="$(latest_mtime "$CODEX_DIR")"
claude_latest="$(latest_mtime "$CLAUDE_DIR")"
if [[ "$mode" == "status" ]]; then
echo "Codex latest mtime: $codex_latest"
echo "Claude latest mtime: $claude_latest"
if (( codex_latest == claude_latest )); then
echo "Status: likely in sync"
elif (( codex_latest > claude_latest )); then
echo "Status: Codex appears newer"
else
echo "Status: Claude appears newer"
fi
exit 0
fi
if [[ "$mode" == "auto" ]]; then
if (( codex_latest == claude_latest )); then
echo "No sync needed: mtimes match."
exit 0
elif (( codex_latest > claude_latest )); then
mode="codex-to-claude"
else
mode="claude-to-codex"
fi
fi
if [[ "$mode" == "codex-to-claude" ]]; then
sync_one_way "$CODEX_DIR" "$CLAUDE_DIR"
echo "Synced Codex -> Claude for $SKILL_NAME"
else
sync_one_way "$CLAUDE_DIR" "$CODEX_DIR"
echo "Synced Claude -> Codex for $SKILL_NAME"
fi
name description
thorough-code-review
Use when the user wants a serious, bug-focused code review of a repo, branch, PR, or diff and explicitly wants more than the default handful of findings. Focus on correctness, regressions, missing tests, fake or gamed tests, security, portability, cleanup, and operational risk.

Thorough Code Review

When To Use

Use this skill when the user asks for any of these:

  • a thorough, deep, full, or exhaustive code review
  • a bug hunt on a branch, PR, or repo
  • a review that should go past the usual 3 findings
  • a risk review of a large or high-impact change

Do not use this for style-only review, rewriting, or generic architecture discussion.

Outcome

Produce a findings-first review that keeps going until concrete new bugs stop appearing.

If you fix findings in-turn, leave behind honest verification and broader test coverage for the bug family rather than a single narrow regression.

Review Target

Default to the most relevant bounded change surface first:

  1. active PR diff, if one exists
  2. current branch diff against the remote default branch
  3. current worktree changes
  4. repo-wide review only when the user explicitly asks for that, or when the changed diff is too small to explain the real risk

If the diff is tiny but obviously sits on top of a larger new subsystem, widen the review to the touched subsystem and its adjacent tests/config. State that assumption.

Workflow

  1. Inspect the review surface before reading code. Check branch, base branch, diff stats, changed files, existing tests, generated outputs, and verification surfaces. If the repo shape is unclear, run scripts/review_surface_probe.py <repo-root> first.

  2. Read the highest-risk paths first. Prioritize persistence, migrations, auth, API boundaries, background jobs, cleanup/teardown, file/path handling, concurrency, retries/timeouts, build/deploy scripts, and provider fallback logic.

  3. Keep going past 3 findings. Continue reviewing until additional passes stop producing concrete new bugs or regressions.

  4. Confirm findings from the real implementation. Prefer bugs you can trace through code and honest runtime behavior. Avoid speculative nits.

  5. Review the tests as part of the implementation. Missing coverage is a finding when a risky path is untested or when tests only validate a mocked or happy-path fantasy.

  6. Hunt for fake, gamed, or checkbox tests. Treat these as integrity problems, not minor test nits.

  7. If you fix a finding in-turn, leave broader coverage behind. The default is bug repro test + category test, extending the nearest shared suite first.

  8. Run turn-end verification before you stop. Use the narrowest changed-surface tests plus the repo's standard verify/build gate when one exists.

What To Look For

  • broken defaults that silently disable a feature
  • mismatches between the public contract and the production path
  • path resolution bugs and cwd-sensitive behavior
  • leaked processes, temp files, locks, sockets, or background state
  • fallback paths that still fail in the environment they are supposed to rescue
  • over-mocked tests that pass while reality fails
  • unsafe mutation, accidental destructive behavior, or silent data loss
  • retries, replays, or partial failures that drift durable state from the truth
  • portability problems in scripts and helpers
  • missing validation around required inputs

Output Format

Findings first, ordered by severity.

For each finding:

  • start with a severity tag like [P1]
  • name the concrete problem
  • explain the impact and why it happens
  • cite exact file and line references

After findings, include:

  • any fake/gamed tests you fixed immediately, with a short note on what changed
  • brief open questions or assumptions
  • a short note on residual risks or test gaps

If there are no findings, say No findings. explicitly, then mention any residual risk areas you were unable to verify.

Severity

  • P1: likely to break the main workflow, corrupt data, create a security issue, or invalidate a core feature
  • P2: important correctness, portability, cleanup, or reliability problem with a realistic failure mode
  • P3: lower-severity correctness issue, sharp edge, or meaningful test gap

References

Load these when needed:

  • references/test-expansion-and-verification.md Use for generalized test coverage, placement heuristics, and turn-end verification expectations.
  • references/browser-and-runtime-proof.md Use when browser verification, Netlify-backed behavior, deployed truth, or tool-surface choice matters.

Operating Rules

  • Prefer concrete bugs, regressions, and missing tests over style commentary.
  • Do not pad the review with weak or duplicate findings just to make it longer.
  • Do not hide additional findings because an internal tool would have stopped at 3.
  • For fake or checkbox tests, prefer fixing them in the same turn over merely filing them as findings.
  • Repo-local instructions override this skill when they are stricter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment