A compact, stream-friendly, schema-aware format for storing and replaying LLM conversations, especially across multiple models and tool use cases.
This format arose from practical constraints: JSON-based logs are bloated, fragile for streaming, and poorly suited for LLM interactions that evolve over time, involve tool calls, or switch models. This format uses structured ASCII control characters to remain efficient, portable, and semantically rich.
- ✅ Streaming-compatible: aligns with how LLMs produce output (in parts, not wholes)
- ✅ Semantically tagged: supports roles, tool calls, metadata, output types
- ✅ Compact: much smaller than JSONL, YAML, or Markdown logs
- ✅ Portable: can encode logs from virtually any LLM chat format
- ✅ Agent-aware: supports multi-step reasoning, tool interaction, turn-based control
This is a text-based format using ASCII control characters to structure and separate content:
- FS (
\x1C) – File Separator: separates individual messages - GS (
\x1D) – Group Separator: separates message header from body chunks - RS (
\x1E) – Record Separator: separates key-value metadata - US (
\x1F) – Unit Separator: separates key and value within a key-value pair
Escaping for control and special characters is done using:
\nn– wherennis a two-digit hex ASCII code (e.g.\20for space)
All string fields are encoded as escaped UTF-8 text.
file := (msg FS)*
msg := header (GS chunk meta*)?
header := tag ((US keyword)? RS value)*
meta := US keyword RS value
chunk := escaped UTF-8 text
tag := escaped UTF-8 text
keyword := escaped UTF-8 text
value := escaped UTF-8 textThe format supports a small schema useful for LLM-driven applications. Each message begins with a tag, and may include metadata, followed by one or more body chunks.
kernel(){initialization parameters or base context}
user(){user prompt or message}
Supports reasoning-phase tagging and constraints.
assistant(**channel=thought|commentary|final,constrain=json|n/a){}
request(name){} // tool call request
response(name){} // response from tool
turn() // signals LLM turn control
Represented here with readable tokens instead of raw bytes:
assistant␞Hello, how can I help today?␜
user␞I'd like a summary of this text.␜
assistant␟channel␞thought␜First, I’ll read the text...␜
assistant␟channel␞final␜Here's the summary:␜
(Raw control characters: ␜ = FS, ␞ = GS, ␞ = RS, ␟ = US)
The format was designed to encode and unify diverse LLM chat protocols, including:
- OpenAI Chat API (role/content)
- Anthropic Claude format (system/human/assistant)
- Mistral/OpenChat format (user/assistant pairs)
- Local LLMs with streamed tokens
- Tool-based agents (function calls / toolchains)
All can be transcribed into this format with role, metadata, and content intact.
- The format reflects how LLMs actually work — outputting text in parts, responding to turns, and interacting with tools or environments.
- It’s trivially parsable with a simple state machine or line-based reader.
- Suitable for use in:
- Agent orchestration
- Debugging / comparison across models
- Replay of conversations
- Dataset generation
- Persistent logs for privacy-aware local LLMs
- Schema versioning
- Reference parser and formatter (Python / Rust)
- Markdown or JSON export
- Tooling:
chatlog view,chatlog diff,chatlog replay - Visualization UI (maybe TUI)
MIT or Unlicense — your call.