Skip to content

Instantly share code, notes, and snippets.

@grapeot
Created January 26, 2026 01:31
Show Gist options
  • Select an option

  • Save grapeot/4271a9782da18b2e746a42e274720f77 to your computer and use it in GitHub Desktop.

Select an option

Save grapeot/4271a9782da18b2e746a42e274720f77 to your computer and use it in GitHub Desktop.
AI Command-Line Agent Tools: Automation & Integration Guide

AI Command-Line Agent Tools: Automation & Integration Guide

This guide defines Claude Code CLI and Codex CLI as general-purpose "intelligent compute nodes." They're not just coding assistants—they're advanced reasoning engines that integrate into automation pipelines.


0. Why CLI Agents Instead of Raw APIs?

Directly calling LLM APIs (like chat/completions) is simple, but has clear limitations when handling complex tasks (full-document translation, large-scale refactoring). Using a CLI Agent as an intermediary offers these core advantages:

  1. Anti-Laziness: When tasks are large, direct API calls often result in outputs like (omitted for brevity...) or truncation. CLI Agents (especially Claude Code) have native "loop execution" and "self-correction" capabilities—they can sense file state, operate in chunks or loops, and ensure tasks are actually completed.
  2. Native File Context Support: They automatically handle file reading, encoding, and writing, decoupling "reasoning" from "IO." You just specify the goal; the Agent optimizes the reading details.
  3. Tool-Rich Environment: They come with mature MCP (Model Context Protocol) plugins. For example, in our environment, they can call Tavily for web searches anytime, or write temporary scripts to process data—capabilities that are extremely expensive to simulate via API.
  4. Optimized Context Management: Agent frameworks automatically handle Context Window consumption and long-conversation compression, which is more robust and efficient than hand-coded API logic.

1. Core Task Pattern: File-Based Mode Only

In production, never use pipe patterns like echo | claude for core logic. We exclusively use File-Based Mode.

Why File Mode Only?

  1. Determinism: The AI's mental model when "editing files" is "complete the work and save," while in "conversation" mode it's "answer the question." The former is less prone to truncation.
  2. Auditability: File system changes before and after (git diff) are the single source of truth.
  3. Large Capacity: Bypasses command-line argument length limits, letting the Agent decide how to efficiently read files.

Standard Operating Procedure

# 1. Prepare context (store content to process in a file)
cp raw_data.json task_context.json

# 2. Issue instructions (have the Agent modify that file)
codex exec --full-auto "Read task_context.json, translate Chinese to English, preserve JSON structure. Modify the file in place."

2. Production Case: Batch JSON Translation

In our automated translation scripts, we no longer manually parse text to send to APIs—we hand the entire Tiptap JSON directly to the CLI Agent.

2.1 Task Description Example

# Core logic: Have the Agent modify the target file directly
prompt = f"""You are a professional content translator.
1. Read {target_file}.
2. Translate Tiptap JSON text nodes from Chinese to English.
3. Keep technical terms (like "vibe coding") unchanged.
4. Ensure the file remains valid JSON.
Modify the file in place."""

# Invocation (Codex example, using latest model with reasoning control)
subprocess.run([
    "codex", "exec", 
    "--dangerously-bypass-approvals-and-sandbox",
    "-m", "gpt-5.2",                   # Explicitly specify latest model
    "-c", 'model_reasoning_effort="low"', # Adjust reasoning intensity (low/medium/high)
    "-C", str(target_dir),
    prompt.replace('\0', '')           # Important: Clean Nul Bytes to avoid system errors
])

3. Model & Reasoning Control

In the GPT-5.2 era, we can precisely control AI's "thinking cost" and "reasoning depth" through parameters.

3.1 Core Parameters (Codex)

  • -m, --model: Recommend using gpt-5.2. This version is optimized for Agentic workflows and performs better on long contexts and complex refactoring.
  • -c model_reasoning_effort:
    • low: Suitable for simple format conversion, text translation, README updates. Extremely fast, low cost.
    • medium (default): Suitable for routine bug fixes, single-file refactoring.
    • high: Suitable for cross-file logic migration, deep code audits.

Performance Tip: For batch translation tasks, forcing low can improve response speed by 2x+ with minimal impact on literal translation quality.


4. Observability & Streaming

For long-running tasks (like large-scale translation), simply waiting for process completion (blocking) loses visibility into task progress. We encourage using Streaming JSON mode.

4.1 Key Parameters

  • Claude Code: Use --output-format stream-json.
  • Codex: Use --json. It outputs structured events containing thought (thinking process), call (tool invocations), and response.

4.2 Best Practice: Show Real-Time Logs by Default

When writing integration scripts, strongly recommend enabling and printing AI logs in real-time (especially thought and call events).

  1. Transparency: Developers can instantly see if the AI is reading files correctly or stuck in a loop.
  2. Debugging Efficiency: When tasks fail, the most intuitive error cause is usually in the last few thought or call lines.
  3. Feedback: When processing large files, scrolling real-time logs provide stronger certainty than a static progress bar.

4.3 Standard Practice: Structured Streaming Logs (JSON Lines)

For multi-process or long-running tasks, strongly recommend applications print intermediate results to stdout in JSON Lines format—even with interleaved multi-process output, it's easy for machines to parse and monitor.

Standard Log Fields:

{"event": "task_completed", "id": "post_123", "status": "success", "details": "Translated 5 comments"}
  • Required Fields:
    • event: Event type (e.g., start, progress, complete, error).
    • id: Unique task identifier (e.g., post_id, file_path).
    • status: Current state.
  • Purpose:
    • Real-time Observability: Provides instant progress feedback in CI/CD or terminal.
    • Decoupling: Scripts only need to print intermediate state to stdout, not aggregate or parse final disk files. External tools can consume these logs via pipe if needed.

4.4 Error Handling: Nul Byte Safety

Warning: If you pass Prompts via stdin, or the Prompt contains invisible \0 characters, Codex CLI (Rust core) will throw a nul byte found error and crash. Countermeasures:

  1. Prefer passing Prompts as the last command-line argument rather than via pipe.
  2. Always execute .replace('\0', '') on strings before passing.

5. Privilege Management

5.1 Common Permission Modes

Scenario Recommended (Claude) Recommended (Codex)
Translation/Doc Rewriting --permission-mode acceptEdits --full-auto
Automated Testing/Deployment --permission-mode bypassPermissions --dangerously-bypass-approvals-and-sandbox

6. Advanced: How to Have Another AI Call These Tools?

If you're writing an Agent (like Gemini) to call these CLIs, provide these "meta-instructions":

"When facing large-scale text processing or file system operations, call the underlying claude or codex.

  1. Prefer file-based mode—first store content to process in a local temp file.
  2. Use streaming mode (--json) and parse events in real-time to monitor progress.
  3. Set appropriate reasoning_effort (e.g., low for translation).
  4. Clean null characters before passing Prompts, and pass them as command-line arguments."

Note: When Claude quota is limited, prioritize using Codex for testing.

🚀 Advanced Tricks & Performance Tuning

1. Parallel Chunk-based Translation

For very large files (over 2000 lines), the system automatically switches to parallel mode to break through single-thread speed bottlenecks:

  • Auto-splitting: Files are split into independent chunks of 1000 lines each.
  • High Concurrency: Uses ProcessPoolExecutor to launch up to 8 workers in parallel.
  • Context Preservation: Each worker still reads the complete file to maintain context understanding, but Prompts strictly limit them to only modify their assigned line range.
  • Result: A 20k-line giant file translates in ~8 minutes (vs 45+ minutes serial).

2. Autonomous Prompt Engineering

To handle complex tasks unattended, Prompt design is crucial:

  • Chunked Edit Instructions: Explicitly tell the AI it can edit in chunks (e.g., 1000 lines at a time) if the file is too large.
  • Persistence Instructions: Use strong directive words (like "MUST persist", "complete the ENTIRE file") to prevent the AI from getting lazy or giving up midway.
  • Self-Correction: Require the AI to perform quality checks and JSON format validation before submitting final results.

3. Dynamic Timeouts

To prevent large file processing from being killed unexpectedly, the system uses dynamic timeout strategy:

  • Formula: Base 10 minutes + 10 minutes per 5000 lines.
  • Range Limits: Minimum 10 minutes, maximum 45 minutes.
  • Granularity: Each parallel worker has its own timeout quota, ensuring complex paragraphs have sufficient processing time.

4. Model Optimization

After testing, this is currently the most cost-effective translation configuration:

  • Model: gpt-5.2 (smarter and more stable than older versions).
  • Reasoning Effort: low. For translation tasks, "low" is sufficient and fastest—no need for "high" deep reasoning costs.
  • Invocation Example:
codex exec ... -m gpt-5.2 -c 'model_reasoning_effort="low"' ...

5. Parallel Stream Output Visibility

In concurrent mode, subprocess output gets stuck in buffers if not force-flushed.

  • Trick: Force flush=True in Python script print() calls.
  • Effect: This ensures that even with 8 processes running in parallel, you can see interleaved but real-time progress logs in the terminal (like [L1-1000], [L2001-3000])—very important for psychological safety during long-running tasks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment