This document outlines a plan for creating a text input language to describe transformer library expressions (ASTEdits and associated components) that are passed to makeRule. This language would be analogous to clang-query's text input language for AST matchers, enabling declarative specification of code transformations.
- clang-query: Provides text-based matcher specification (e.g.,
match functionDecl()) - Transformer Library: Requires C++ code to specify edits using functions like:
changeTo(),insertBefore(),insertAfter(),remove()addInclude()- Range selectors:
node(),statement(),name(),member(), etc. - Stencils:
cat(),text(),node(),describe(), etc.
Create a text language that allows users to write transformation rules declaratively, similar to:
edit changeTo(node("x"), cat("newValue"))
edit addInclude("header.h")
Instead of C++ code:
{changeTo(node("x"), cat("newValue")),
addInclude("header.h")}The text language will consist of three primary domains:
- Edit Operations - What to do (changeTo, remove, insert, addInclude)
- Range Selectors - Where to do it (node, statement, member, name, etc.)
- Text Generators (Stencils) - What to generate (cat, text, describe, etc.)
<edit-list> ::= <edit> | <edit> "," <edit-list>
<edit> ::= <edit-op> "(" <range-sel> "," <text-gen> ")"
| "addInclude" "(" <string> ["," <format>] ")"
| "remove" "(" <range-sel> ")"
| "note" "(" <range-sel> "," <text-gen> ")"
<range-sel> ::= "node" "(" <id> ")"
| "statement" "(" <id> ")"
| "name" "(" <id> ")"
| "member" "(" <id> ")"
| "callArgs" "(" <id> ")"
| "before" "(" <range-sel> ")"
| "after" "(" <range-sel> ")"
| "enclose" "(" <range-sel> "," <range-sel> ")"
| "expansion" "(" <range-sel> ")"
<text-gen> ::= "cat" "(" <text-part> ["," <text-part>]* ")"
| "text" "(" <string> ")"
| "node" "(" <id> ")"
| "describe" "(" <id> ")"
| "expression" "(" <id> ")"
| "deref" "(" <id> ")"
| "addressOf" "(" <id> ")"
| "maybeDeref" "(" <id> ")"
| "maybeAddressOf" "(" <id> ")"
| "access" "(" <id> "," <text-gen> ")"
| "ifBound" "(" <id> "," <text-gen> "," <text-gen> ")"
<text-part> ::= <string> | <id> | <text-gen>
<id> ::= identifier (matches bound node names from matcher)
<string> ::= quoted string literal
<format> ::= "quoted" | "angled"
<edit-op> ::= "changeTo" | "insertBefore" | "insertAfter"
Similar to QueryParser, create an EditParser that tokenizes:
- Keywords:
changeTo,insertBefore,insertAfter,remove,addInclude,cat,node, etc. - Identifiers: bound node names from the matcher
- String literals: header names, text content
- Punctuation:
(,),,,"
class EditParser {
public:
static EditGenerator parse(StringRef Line, const QuerySession &QS);
static std::vector<llvm::LineEditor::Completion>
complete(StringRef Line, size_t Pos, const QuerySession &QS);
private:
EditGenerator parseEditList();
ASTEdit parseEdit();
RangeSelector parseRangeSelector();
TextGenerator parseTextGenerator();
// Similar to QueryParser's structure
StringRef lexWord();
StringRef lexString();
void skipWhitespace();
};Add new commands to clang-query:
edit EDIT_SPEC Parse and display edit specification
transform MATCHER, EDIT_SPEC Create complete transformation rule
let editvar EDIT_SPEC Bind edit specifications to names
Example session:
clang-query> let myEdit changeTo(node("x"), cat("newValue"))
clang-query> transform callExpr(callee(functionDecl(hasName("foo")))), myEdit
Extend TransformerClangTidyCheck to accept text-based edit specifications:
class StringFindStrContainsCheck : public TransformerClangTidyCheck {
public:
StringFindStrContainsCheck(StringRef Name, ClangTidyContext *Context)
: TransformerClangTidyCheck(Name, Context) {
// Read edit spec from options
StringRef EditSpec = Options.get("EditSpecification", "");
setRule(makeRule(parseMatcherFromConfig(...),
EditParser::parse(EditSpec, ...)));
}
};Support in .clang-tidy files:
Checks: 'custom-*'
CheckOptions:
- key: custom.MyTransform.Matcher
value: 'callExpr(callee(functionDecl(hasName("foo"))))'
- key: custom.MyTransform.Edits
value: 'changeTo(node("root"), cat("bar()")), addInclude("bar.h")'Implement basic edit operations:
changeTo(range, text)remove(range)insertBefore(range, text)insertAfter(range, text)addInclude(header)
Basic range selectors:
node(id)statement(id)name(id)
Basic text generators:
cat(...)text(string)node(id)
Add complex range selection:
before(),after(),enclose()member(),callArgs(),constructExprArgs()statements(),initListElements()expansion()
Implement advanced text generation:
describe(),expression()deref(),addressOf(),maybeDeref(),maybeAddressOf()access(base, member)ifBound(id, true_case, false_case)selectBound(cases, default)
Add edit composition and conditionals:
ifBound()for conditional editsflatten()for edit compositionrewriteDescendants()- Variable binding and reuse
Support for diagnostic metadata:
note()functionwithMetadata()function- Explanation text
- Syntax errors: "Expected ')' after edit operation"
- Unknown identifiers: "Unknown range selector 'nod'"
- Type mismatches: "Range selector expected, got text generator"
- Unbound identifiers: "Node 'x' not bound in match result"
- Invalid ranges: "Selected range is not valid for editing"
- Include errors: "Invalid include format"
Test each component independently:
- Lexer tests (tokenization)
- Parser tests (grammar rules)
- Evaluation tests (execution semantics)
Test complete transformations:
TEST(EditParserTest, BasicChangeTo) {
auto Edit = EditParser::parse("changeTo(node(\"x\"), cat(\"new\"))");
// Verify generated ASTEdit matches expected structure
}
TEST(EditParserTest, MultipleEdits) {
auto EditGen = EditParser::parse(
"changeTo(node(\"x\"), cat(\"new\")), "
"addInclude(\"header.h\")");
// Verify edit list generation
}Test in actual clang-tidy checks and transformations.
Complete specification of:
- Grammar
- Built-in functions
- Type system
- Semantics
- Tutorial for common patterns
- Migration guide from C++ API
- Examples for each operation
- Parser API
- Integration points
- Extension mechanisms
Current C++ Code:
makeRule(callExpr(callee(functionDecl(hasName("foo")))),
changeTo(cat("bar()")))Text Language:
changeTo(node("root"), cat("bar()"))
Current C++ Code:
makeRule(matcher,
{changeTo(node("call"), cat("absl::StrContains(", node("str"), ", ",
node("param"), ")")),
addInclude("absl/strings/match.h")})Text Language:
changeTo(node("call"), cat("absl::StrContains(", node("str"), ", ",
node("param"), ")")),
addInclude("absl/strings/match.h")
Current C++ Code:
ifBound("opt",
changeTo(node("x"), cat("value")),
remove(node("x")))Text Language:
ifBound("opt",
changeTo(node("x"), cat("value")),
remove(node("x")))
Current C++ Code:
flatten(
changeTo(name("func"), cat("newName")),
changeTo(callArgs("call"), cat("newArg1, newArg2")),
addInclude("new_header.h"))Text Language:
changeTo(name("func"), cat("newName")),
changeTo(callArgs("call"), cat("newArg1, newArg2")),
addInclude("new_header.h")
- Declarative Specification: Separates transformation logic from C++ code
- Configuration-Friendly: Can be stored in config files (.clang-tidy)
- Interactive Development: Can be tested in clang-query REPL
- Lower Barrier to Entry: No C++ compilation needed for simple transformations
- Composability: Named edit specifications can be reused
- Consistency: Matches clang-query's matcher syntax style
Issue: Text language loses compile-time type checking Mitigation: Runtime validation with clear error messages; optional schema validation
Issue: Complex C++ logic in stencils hard to express
Mitigation: Support run() function to call registered C++ callbacks
Issue: Parsing overhead at runtime Mitigation: Cache parsed representations; pre-compile for production
Issue: Harder to debug text specifications Mitigation:
- Add verbose mode to show parse tree
- Provide dry-run mode to preview edits
- Include source location tracking in errors
- Edit Macros: Define reusable edit patterns
- Variables: Bind intermediate results
- Conditionals: More sophisticated control flow
- Loops: Iterate over collections
- Custom Functions: Register C++ functions callable from text
- Pattern Matching: Match on bound node types/values
- Schema Validation: Validate edit specs against schemas
- Month 1: Phase 1 - Core edit operations (MVP)
- Month 2: Phase 2 - Advanced selectors
- Month 3: Phase 3 - Advanced stencils
- Month 4: Phase 4 - Control flow and composition
- Month 5: Phase 5 - Metadata and testing
- Month 6: Documentation and polish
This plan provides a comprehensive approach to creating a text-based language for transformer library expressions. By following the phased implementation approach and learning from clang-query's design, we can create a powerful, user-friendly tool for declarative code transformation specification.
The key innovation is making code transformations as accessible as pattern matching is today with clang-query, opening up the transformer library to a broader audience and enabling configuration-driven refactoring workflows.