Skip to content

Instantly share code, notes, and snippets.

@LegalizeAdulthood
Created November 7, 2025 03:00
Show Gist options
  • Select an option

  • Save LegalizeAdulthood/d1d7cf01b1c7017cb4659e7528dd9bc0 to your computer and use it in GitHub Desktop.

Select an option

Save LegalizeAdulthood/d1d7cf01b1c7017cb4659e7528dd9bc0 to your computer and use it in GitHub Desktop.
Transformer Text Language Design Plan

Transformer Text Language Design Plan

Executive Summary

This document outlines a plan for creating a text input language to describe transformer library expressions (ASTEdits and associated components) that are passed to makeRule. This language would be analogous to clang-query's text input language for AST matchers, enabling declarative specification of code transformations.

Background

Current State

  • clang-query: Provides text-based matcher specification (e.g., match functionDecl())
  • Transformer Library: Requires C++ code to specify edits using functions like:
    • changeTo(), insertBefore(), insertAfter(), remove()
    • addInclude()
    • Range selectors: node(), statement(), name(), member(), etc.
    • Stencils: cat(), text(), node(), describe(), etc.

Goal

Create a text language that allows users to write transformation rules declaratively, similar to:

edit changeTo(node("x"), cat("newValue"))
edit addInclude("header.h")

Instead of C++ code:

{changeTo(node("x"), cat("newValue")), 
 addInclude("header.h")}

Architecture Overview

Component Structure

The text language will consist of three primary domains:

  1. Edit Operations - What to do (changeTo, remove, insert, addInclude)
  2. Range Selectors - Where to do it (node, statement, member, name, etc.)
  3. Text Generators (Stencils) - What to generate (cat, text, describe, etc.)

Language Grammar (High-Level)

<edit-list> ::= <edit> | <edit> "," <edit-list>
<edit>      ::= <edit-op> "(" <range-sel> "," <text-gen> ")"
            | "addInclude" "(" <string> ["," <format>] ")"
            | "remove" "(" <range-sel> ")"
            | "note" "(" <range-sel> "," <text-gen> ")"

<range-sel> ::= "node" "(" <id> ")"
            | "statement" "(" <id> ")"
            | "name" "(" <id> ")"
            | "member" "(" <id> ")"
            | "callArgs" "(" <id> ")"
            | "before" "(" <range-sel> ")"
            | "after" "(" <range-sel> ")"
            | "enclose" "(" <range-sel> "," <range-sel> ")"
            | "expansion" "(" <range-sel> ")"

<text-gen>  ::= "cat" "(" <text-part> ["," <text-part>]* ")"
            | "text" "(" <string> ")"
            | "node" "(" <id> ")"
            | "describe" "(" <id> ")"
            | "expression" "(" <id> ")"
            | "deref" "(" <id> ")"
            | "addressOf" "(" <id> ")"
            | "maybeDeref" "(" <id> ")"
            | "maybeAddressOf" "(" <id> ")"
            | "access" "(" <id> "," <text-gen> ")"
            | "ifBound" "(" <id> "," <text-gen> "," <text-gen> ")"

<text-part> ::= <string> | <id> | <text-gen>
<id>        ::= identifier (matches bound node names from matcher)
<string>    ::= quoted string literal
<format>    ::= "quoted" | "angled"
<edit-op>   ::= "changeTo" | "insertBefore" | "insertAfter"

Detailed Design

1. Parser Infrastructure

1.1 Lexer/Tokenizer

Similar to QueryParser, create an EditParser that tokenizes:

  • Keywords: changeTo, insertBefore, insertAfter, remove, addInclude, cat, node, etc.
  • Identifiers: bound node names from the matcher
  • String literals: header names, text content
  • Punctuation: (, ), ,, "

1.2 Parser Structure

class EditParser {
public:
  static EditGenerator parse(StringRef Line, const QuerySession &QS);
  static std::vector<llvm::LineEditor::Completion> 
    complete(StringRef Line, size_t Pos, const QuerySession &QS);

private:
  EditGenerator parseEditList();
  ASTEdit parseEdit();
  RangeSelector parseRangeSelector();
  TextGenerator parseTextGenerator();
  
  // Similar to QueryParser's structure
  StringRef lexWord();
  StringRef lexString();
  void skipWhitespace();
};

2. Integration Points

2.1 Extended clang-query Commands

Add new commands to clang-query:

edit EDIT_SPEC                  Parse and display edit specification
transform MATCHER, EDIT_SPEC    Create complete transformation rule
let editvar EDIT_SPEC           Bind edit specifications to names

Example session:

clang-query> let myEdit changeTo(node("x"), cat("newValue"))
clang-query> transform callExpr(callee(functionDecl(hasName("foo")))), myEdit

2.2 ClangTidy Integration

Extend TransformerClangTidyCheck to accept text-based edit specifications:

class StringFindStrContainsCheck : public TransformerClangTidyCheck {
public:
  StringFindStrContainsCheck(StringRef Name, ClangTidyContext *Context)
  : TransformerClangTidyCheck(Name, Context) {
    // Read edit spec from options
    StringRef EditSpec = Options.get("EditSpecification", "");
    setRule(makeRule(parseMatcherFromConfig(...),
        EditParser::parse(EditSpec, ...)));
  }
};

2.3 Configuration File Format

Support in .clang-tidy files:

Checks: 'custom-*'
CheckOptions:
  - key: custom.MyTransform.Matcher
    value: 'callExpr(callee(functionDecl(hasName("foo"))))'
  - key: custom.MyTransform.Edits
    value: 'changeTo(node("root"), cat("bar()")), addInclude("bar.h")'

3. Implementation Phases

Phase 1: Core Edit Operations (MVP)

Implement basic edit operations:

  • changeTo(range, text)
  • remove(range)
  • insertBefore(range, text)
  • insertAfter(range, text)
  • addInclude(header)

Basic range selectors:

  • node(id)
  • statement(id)
  • name(id)

Basic text generators:

  • cat(...)
  • text(string)
  • node(id)

Phase 2: Advanced Selectors

Add complex range selection:

  • before(), after(), enclose()
  • member(), callArgs(), constructExprArgs()
  • statements(), initListElements()
  • expansion()

Phase 3: Advanced Stencils

Implement advanced text generation:

  • describe(), expression()
  • deref(), addressOf(), maybeDeref(), maybeAddressOf()
  • access(base, member)
  • ifBound(id, true_case, false_case)
  • selectBound(cases, default)

Phase 4: Control Flow and Composition

Add edit composition and conditionals:

  • ifBound() for conditional edits
  • flatten() for edit composition
  • rewriteDescendants()
  • Variable binding and reuse

Phase 5: Metadata and Notes

Support for diagnostic metadata:

  • note() function
  • withMetadata() function
  • Explanation text

4. Error Handling

4.1 Parse-Time Errors

  • Syntax errors: "Expected ')' after edit operation"
  • Unknown identifiers: "Unknown range selector 'nod'"
  • Type mismatches: "Range selector expected, got text generator"

4.2 Evaluation-Time Errors

  • Unbound identifiers: "Node 'x' not bound in match result"
  • Invalid ranges: "Selected range is not valid for editing"
  • Include errors: "Invalid include format"

5. Testing Strategy

5.1 Unit Tests

Test each component independently:

  • Lexer tests (tokenization)
  • Parser tests (grammar rules)
  • Evaluation tests (execution semantics)

5.2 Integration Tests

Test complete transformations:

TEST(EditParserTest, BasicChangeTo) {
  auto Edit = EditParser::parse("changeTo(node(\"x\"), cat(\"new\"))");
  // Verify generated ASTEdit matches expected structure
}

TEST(EditParserTest, MultipleEdits) {
  auto EditGen = EditParser::parse(
    "changeTo(node(\"x\"), cat(\"new\")), "
    "addInclude(\"header.h\")");
  // Verify edit list generation
}

5.3 End-to-End Tests

Test in actual clang-tidy checks and transformations.

6. Documentation Requirements

6.1 Language Reference

Complete specification of:

  • Grammar
  • Built-in functions
  • Type system
  • Semantics

6.2 User Guide

  • Tutorial for common patterns
  • Migration guide from C++ API
  • Examples for each operation

6.3 API Documentation

  • Parser API
  • Integration points
  • Extension mechanisms

Example Use Cases

Example 1: Simple Replacement

Current C++ Code:

makeRule(callExpr(callee(functionDecl(hasName("foo")))),
         changeTo(cat("bar()")))

Text Language:

changeTo(node("root"), cat("bar()"))

Example 2: With Include

Current C++ Code:

makeRule(matcher,
         {changeTo(node("call"), cat("absl::StrContains(", node("str"), ", ", 
            node("param"), ")")),
          addInclude("absl/strings/match.h")})

Text Language:

changeTo(node("call"), cat("absl::StrContains(", node("str"), ", ", 
          node("param"), ")")),
addInclude("absl/strings/match.h")

Example 3: Conditional Edit

Current C++ Code:

ifBound("opt", 
        changeTo(node("x"), cat("value")),
        remove(node("x")))

Text Language:

ifBound("opt",
   changeTo(node("x"), cat("value")),
   remove(node("x")))

Example 4: Complex Transformation

Current C++ Code:

flatten(
  changeTo(name("func"), cat("newName")),
  changeTo(callArgs("call"), cat("newArg1, newArg2")),
  addInclude("new_header.h"))

Text Language:

changeTo(name("func"), cat("newName")),
changeTo(callArgs("call"), cat("newArg1, newArg2")),
addInclude("new_header.h")

Benefits

  1. Declarative Specification: Separates transformation logic from C++ code
  2. Configuration-Friendly: Can be stored in config files (.clang-tidy)
  3. Interactive Development: Can be tested in clang-query REPL
  4. Lower Barrier to Entry: No C++ compilation needed for simple transformations
  5. Composability: Named edit specifications can be reused
  6. Consistency: Matches clang-query's matcher syntax style

Challenges and Mitigations

Challenge 1: Type Safety

Issue: Text language loses compile-time type checking Mitigation: Runtime validation with clear error messages; optional schema validation

Challenge 2: Complex Stencils

Issue: Complex C++ logic in stencils hard to express Mitigation: Support run() function to call registered C++ callbacks

Challenge 3: Performance

Issue: Parsing overhead at runtime Mitigation: Cache parsed representations; pre-compile for production

Challenge 4: Debugging

Issue: Harder to debug text specifications Mitigation:

  • Add verbose mode to show parse tree
  • Provide dry-run mode to preview edits
  • Include source location tracking in errors

Future Extensions

  1. Edit Macros: Define reusable edit patterns
  2. Variables: Bind intermediate results
  3. Conditionals: More sophisticated control flow
  4. Loops: Iterate over collections
  5. Custom Functions: Register C++ functions callable from text
  6. Pattern Matching: Match on bound node types/values
  7. Schema Validation: Validate edit specs against schemas

Implementation Timeline

  • Month 1: Phase 1 - Core edit operations (MVP)
  • Month 2: Phase 2 - Advanced selectors
  • Month 3: Phase 3 - Advanced stencils
  • Month 4: Phase 4 - Control flow and composition
  • Month 5: Phase 5 - Metadata and testing
  • Month 6: Documentation and polish

Conclusion

This plan provides a comprehensive approach to creating a text-based language for transformer library expressions. By following the phased implementation approach and learning from clang-query's design, we can create a powerful, user-friendly tool for declarative code transformation specification.

The key innovation is making code transformations as accessible as pattern matching is today with clang-query, opening up the transformer library to a broader audience and enabling configuration-driven refactoring workflows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment