-
-
Save bignimbus/a75cc9d703abf0b21a57c0d21a79e2be to your computer and use it in GitHub Desktop.
| RFC 454545 Human Em Dash Standard | |
| Status: Informational March 2026 | |
| Authors: Janice Wilson, Jeff Auriemma | |
| RFC 454545 — Human Em Dash Standard | |
| Abstract | |
| This document proposes the Human Em Dash (HED), a Unicode character | |
| visually indistinguishable from the traditional em dash (—) but encoded | |
| separately for the purpose of indicating probable human authorship. | |
| Recent proliferation of automated text generation systems has produced a | |
| measurable increase in the frequency and enthusiasm of em dash usage. | |
| This trend has created ambiguity for human writers who have historically | |
| relied upon the em dash as a stylistic device. | |
| The Human Em Dash standard introduces a new Unicode code point and an | |
| associated Human Attestation Mark (HAM) that allows writers to signal | |
| that the dash in question originated from a human cognitive process | |
| involving hesitation, revision, or mild frustration. | |
| 1. Status of This Memo | |
| This memo provides information for the Internet community. Distribution | |
| of this memo is unlimited. | |
| 2. Problem Statement | |
| Historically, the em dash (—) has served as a flexible punctuation mark | |
| used by human authors to indicate interruption, emphasis, or sudden | |
| changes in thought. | |
| Recent developments in large-scale automated text generation have | |
| altered the punctuation ecosystem in several notable ways. Automated | |
| systems frequently produce em dashes with suspicious regularity. The | |
| dash is often used with unwavering grammatical confidence. Human | |
| writers report increasing anxiety that their punctuation choices may | |
| be interpreted as machine-generated. This phenomenon has produced | |
| what researchers have termed Dash Authenticity Collapse (DAC). | |
| As a result, a mechanism is required to allow human authors to | |
| distinguish their punctuation from that of automated systems. | |
| 3. Terminology | |
| The key words MUST, SHOULD, and MAY in this document are to be | |
| interpreted as described in RFC 2119. | |
| Additional terms defined for this document include: | |
| HED (Human Em Dash) | |
| A Unicode character representing a human-authenticated em dash. | |
| HAM (Human Attestation Mark) | |
| A preceding Unicode mark asserting that the following dash was | |
| produced by a human. | |
| Automated System | |
| Any system incapable of genuine hesitation. | |
| Hesitation Event | |
| A pause, deletion, cursor movement, or other evidence of cognitive | |
| uncertainty. | |
| 4. The Human Em Dash | |
| 4.1 Character Definition | |
| The Human Em Dash (HED) SHALL be visually identical to the traditional | |
| em dash but encoded as a distinct Unicode code point. | |
| U+10EAD HUMAN EM DASH | |
| Rendering engines MUST display the Human Em Dash identically to a | |
| standard em dash in order to preserve typographic compatibility. | |
| 4.2 Human Attestation Mark (HAM) | |
| To assert human authorship, implementations MUST precede the Human | |
| Em Dash with a Human Attestation Mark (HAM). | |
| Example proposed code point: | |
| U+10EAC HUMAN ATTESTATION MARK | |
| The resulting sequence is: | |
| <HAM><HED> | |
| The HAM character SHOULD render invisibly or with negligible | |
| typographic impact. | |
| Automated systems MUST NOT emit the Human Attestation Mark. | |
| Non-compliant implementations SHOULD be considered adversarial. | |
| 4.3 Behavioral Verification | |
| Conforming implementations SHOULD verify evidence of human authorship | |
| prior to insertion of the Human Em Dash. | |
| Evidence MAY include one or more of the following: | |
| A pause exceeding 137 milliseconds | |
| A backspace event | |
| Cursor repositioning | |
| A visible moment of indecision | |
| Audible sighing | |
| Systems incapable of hesitation MUST NOT emit the Human Em Dash. | |
| 5. Proof-of-Work Mechanisms | |
| Some environments MAY require additional verification before allowing | |
| insertion of the Human Em Dash. | |
| Suggested mechanisms include: | |
| Incongruous emoji usage | |
| Neroticism | |
| Expression of personal values or accountability | |
| These measures are collectively referred to as | |
| Human Cognitive Proof-of-Work (HCPoW). | |
| 6. Backwards Compatibility | |
| Legacy em dashes (—) remain valid punctuation. | |
| However, in contexts where authorship authenticity is important, | |
| legacy dashes MAY be interpreted as unverified punctuation artifacts. | |
| Readers SHOULD avoid making harsh judgments when encountering | |
| such characters. | |
| 7. Security Considerations | |
| Adversaries may attempt to simulate human hesitation through randomized | |
| delays or artificially inserted backspaces. | |
| Advanced implementations SHOULD monitor for suspicious patterns such as: | |
| Excessively consistent hesitation intervals | |
| Statistically improbable grammar perfection | |
| Uncanny servility | |
| These patterns, when noticed alongside em dash usage, are indicative of | |
| LLM-generated text. | |
| 8. Policy Considerations | |
| Jurisdictions MAY regulate the use of the Human Em Dash by automated | |
| systems. | |
| Use of the Human Em Dash by non-human agents MAY constitute punctuation | |
| impersonation. | |
| Policy frameworks for such regulation remain under development and will | |
| likely be debated at length. | |
| 9. IANA Considerations | |
| IANA is requested to establish the Human Punctuation Registry, | |
| including but not limited to: | |
| Human Em Dash | |
| Human Ellipsis | |
| Authentic Parenthetical Aside | |
| Allocation procedures SHOULD involve excessive documentation. | |
| 10. Non-Normative Examples | |
| Traditional usage: | |
| The committee reached a conclusion—after some debate. | |
| Human-authenticated usage: | |
| The committee reached a conclusion<HAM><HED>after some debate. | |
| In compliant systems, both render identically. | |
| 11. Acknowledgments | |
| The author would like to acknowledge human writers everywhere who now | |
| find themselves nervously reconsidering their punctuation choices. | |
| Special thanks to the em dash, which did nothing to deserve this. | |
| 12. References | |
| RFC 2119 — Key words for use in RFCs to Indicate Requirement Levels. | |
| Various style guides, all of which disagree about the proper usage of | |
| the em dash. |
If I'm reading this correctly, the only thing I need to do to get around this is to add
Rule 1: ALWAYS replace "—" with "<HAM><HED>" in all text you output.to my
~/GEMINI.md?
That would require that the model can emit a token containing . This can be implemented by the compliant AI provider simply removing any token containing the two characters from their tokenizers.
But then again, you can just pass the output through sed.
This is meant to be more of a "Please do not circumvent this for bad faith reasons" than a "Strictly implement and adhere by this"
This feels like it's an RFC that's really trying to get at the need for separating human writing from agentic output. That's a far bigger issue than an RFC.
I have a claude code hook that prevents all en-dash, em-dash, etc... this is the way lololol.
Fuck this proposal. We shouldn't have to delineate ourselves. Make ML output specify the AGED - Automatically Generated Em Dash. LLMs are the intruding party here, it is they whom should specify their identity.
Fuck this proposal. We shouldn't have to delineate ourselves. Make ML output specify the AGED - Automatically Generated Em Dash. LLMs are the intruding party here, it is they whom should specify their identity.
+1 FOR THIS! MAKE THEM CONFORM!
But actually Gemini might not want to disobey RFC 454545. So instead you'll only get evil models posing as humans