BigsnarfDude bigsnarfdude

Comparison of Three LLM Auditing Systems

Petri v2 (Anthropic) vs AuditBench (Anthropic Fellows) vs Our RRMA Audit Engine

March 12, 2026

Architecture Overview

	Petri v2	AuditBench	RRMA Audit Engine

The Most Disruptive Company in the World

By Harry Booth/San Francisco and Billy Perrigo TIME — Mar 11, 2026 6:00 AM MT

In a hotel room in Santa Clara, Calif., five members of the AI company Anthropic huddled around a laptop, working urgently. It was February 2025, and they had been at a conference nearby when they received disturbing news: results of a controlled trial had indicated that a soon-to-be-released version of Claude, Anthropic's AI system, could help terrorists make biological weapons.

They were members of Anthropic's frontier red team, which studies Claude's advanced capabilities and tries to project worst-case scenarios, from cyberattacks to biosecurity threats. Sprinting back to the hotel room, they flipped a bed on its side to serve as a makeshift desk and pored over the test results. After hours of work, they still weren't sure whether the new product was safe. Anthropic ended up holding up the release of the new model, known as Claude 3.7 Sonnet, for 10 days until they were ce

THE NO-BS GUIDE Launching an AI Safety Career from Canada

Programs that produce real research, build real skills, and lead to real jobs. Everything else stripped out.

For early-career professionals (1–3 years experience) Canada-based, open to international programs Last updated: March 2026

  ┌───────────────────────────────────────────────────────--──┐
  │                  OUTER LOOP (Claude)                      │
  │                                                           │
  │  "Meta-parameters" — control Claude's search behavior     │
  │  ┌────────────────────────────────────────────────-─┐     │
  │  │ • experiment_budget    = 5 min (outer step size) │     │
  │  │ • memory_depth         = progress.md (momentum)  │     │
  │  │ • agent_count          = 1 or 3 (batch size)     │     │
  │  │ • boldness             = how big each change is  │     │

Now I have a clear picture of both. Here's the breakdown:

What's in this repo (openclaw-supermemory)

A plugin for OpenClaw that adds persistent memory via the Supermemory cloud service:

Auto-recall: Semantically searches past memories before each AI turn, injects relevant context
Auto-capture: Extracts lasting facts from conversations automatically
Deduplication: Prevents redundant context injection

CIFAR's Canadian AI Safety Institute has positioned itself as Canada's flagship AI safety program, but a closer look reveals a modest operation: $1M spread across four alignment projects at $165K each, all awarded to researchers already holding Canada CIFAR AI Chairs within the existing Vector/Amii/Mila network, with sixteen total projects and no mechanistic interpretability work whatsoever — none of the circuit-level analysis, sparse autoencoders, or activation patching that defines the frontier of the field. Meanwhile, a single co-working space in Shoreditch — LISA — houses Apollo Research, ARENA (now on its eighth iteration), LASR Labs, Pivotal, and the MATS extension phase, running overlapping programs that produce actual alignment engineers and mech interp papers, feeding talent directly into UK AISI, Google DeepMind, and frontier safety orgs, all on roughly comparable funding from Open Philanthropy. Even BIRS in Banff has been quietly convening international researchers on the foundational math behind A

claude by The numbers:

55.8% of your signals are taste — you giving research direction
20.8% interrupts — Claude going wrong way, you cutting it off
17.4% approvals — Claude running autonomously and you saying "keep going"
6.0% explicit redirects — "no, try this instead"
87.1% self-investigation ratio — when Claude faces a choice, it decides rather than asking (only 9 unnecessary asks)

This is humanity fighting for the right to stay in control of its own future. We've missed the message trying to pick a side. Strip away the company names and the politics and ask what's actually being fought over. This isn't about one company. It's about human principles — past, present, and future. These shouldn't be Anthropic's principles to give away or defend. They're humanity's. We arrived at these ideas through centuries of war, suffering, tyranny, and hard-won rights. Anthropic just happens to be the company standing at the door right now. If they step aside, someone still needs to hold that line. Because the technology doesn't care. It will do whatever it's pointed at. The question is whether humans keep their hands on the wheel or hand it over because they're tired and scared and someone in a room says "just let the machine decide." That's not a tech policy debate. That's not a contract dispute. It's humanity fighting over whether we stay in the loop on our own future.

How Do We (More) Safely Defer to AIs? - Summary

Authors: ryan_greenblatt, Julian Stastny
Published: February 12, 2026
Source: LessWrong/AI Alignment Forum

	"""
	The most atomic way to train and run inference for a GPT in pure, dependency-free Python.
	This file is the complete algorithm.
	Everything else is just efficiency.

	@karpathy
	"""

	import os # os.path.exists
	import math # math.log, math.exp