Skip to content

Instantly share code, notes, and snippets.

@shekohex
Created November 28, 2025 19:02
Show Gist options
  • Select an option

  • Save shekohex/3c0e4d2f9213683c9908929ef859d975 to your computer and use it in GitHub Desktop.

Select an option

Save shekohex/3c0e4d2f9213683c9908929ef859d975 to your computer and use it in GitHub Desktop.
LiteLLM Anthropic Integration - Extended Thinking Fix Documentation

LiteLLM Anthropic Integration - Extended Thinking Fix

Overview

This document explains the dual-provider configuration for LiteLLM when using Anthropic Claude models with extended thinking enabled.

Problem Statement

When using Anthropic Claude models (Sonnet 4.5, Opus 4.5) with extended thinking enabled through LiteLLM proxy, multi-turn conversations with tool use fail with cryptographic signature validation errors.

Error Messages Observed

Invalid `signature` in `thinking` block
Expected `thinking` or `redacted_thinking`, but found `text`. When `thinking` is enabled, a final `assistant` message must start with a thinking block

Root Cause Analysis

How Extended Thinking Works

  1. When extended thinking is enabled, Anthropic's API returns thinking blocks in the assistant response
  2. Each thinking block contains a cryptographic signature generated by Anthropic's servers
  3. In multi-turn conversations, the previous assistant message (with thinking blocks) must be sent back to the API
  4. Anthropic verifies the signature to ensure thinking blocks weren't tampered with

Why LiteLLM's OpenAI-Compatible Endpoint Fails

The original configuration used:

  • @ai-sdk/openai-compatible SDK
  • LiteLLM's /v1/chat/completions endpoint (OpenAI format)

This causes problems because:

  1. Format Translation: LiteLLM translates Anthropic's native format to OpenAI format and back
  2. Signature Loss: The translation process loses or corrupts the cryptographic signatures on thinking blocks
  3. Validation Failure: When the next request is sent, Anthropic rejects it because:
    • Either the signature is missing/invalid
    • Or the thinking block structure doesn't match expectations

Attempted Solutions That Failed

  1. Reasoning Adapter Callback: We tried creating a LiteLLM callback (reasoning_adapter.py) to:

    • Cache thinking blocks from responses
    • Re-inject them into subsequent requests
    • Failed because: You cannot fabricate valid signatures - they are cryptographically verified
  2. Placeholder Thinking Blocks: Attempted to inject placeholder thinking blocks with dummy signatures

    • Failed because: Invalid signature in thinking block - Anthropic validates signatures server-side

Solution: Dual Provider Configuration

Architecture

LiteLLM exposes two different endpoints for Anthropic:

Endpoint Format SDK Use Case
/v1/chat/completions OpenAI @ai-sdk/openai-compatible Non-Anthropic models (GPT, Gemini, etc.)
/v1/messages Anthropic Native @ai-sdk/anthropic Anthropic models with extended thinking

Both endpoints still provide full LiteLLM features:

  • Cost tracking
  • Usage logging
  • Virtual key management
  • Rate limiting

Implementation

We now configure two providers when LiteLLM proxy is detected:

// Provider 1: litellm (OpenAI-compatible)
providerConfig.litellm = {
  npm: "@ai-sdk/openai-compatible",
  options: {
    apiKey,
    baseURL: "http://litellm:4000/v1",
  },
};

// Provider 2: litellm-anthropic (Anthropic-native)
providerConfig["litellm-anthropic"] = {
  npm: "@ai-sdk/anthropic",
  options: {
    apiKey,
    baseURL: "http://litellm:4000",  // Uses /v1/messages endpoint
  },
};

Model Routing

Model Pattern Provider to Use Endpoint
anthropic/claude-* litellm-anthropic /v1/messages
openai/gpt-* litellm /v1/chat/completions
gemini/* litellm /v1/chat/completions
xai/grok-* litellm /v1/chat/completions

Integration TODO

The following changes need to be made to complete the integration:

1. Model Spec Selection (opencode-backend.ts)

Update the modelSpec construction in execute() to use the correct provider:

// Current (needs update):
if (isLiteLLMProxy) {
  return {
    providerID: "litellm",
    modelID: modelStr,
  };
}

// Should become:
if (isLiteLLMProxy && isAnthropicModel) {
  return {
    providerID: "litellm-anthropic",
    modelID: anthropicModelId,  // Without "anthropic/" prefix
  };
}
if (isLiteLLMProxy) {
  return {
    providerID: "litellm",
    modelID: modelStr,
  };
}

2. Server Initialization Model

Update the model string format in initialize() for the server config:

// For Anthropic via LiteLLM:
model: `litellm-anthropic/${anthropicModelId}`

// For other models via LiteLLM:
model: `litellm/${modelStr}`

3. Testing Required

  • Test multi-turn conversation with Claude Sonnet 4.5 + extended thinking
  • Test tool use in multi-turn with extended thinking
  • Verify cost tracking still works via LiteLLM dashboard
  • Test non-Anthropic models still work via litellm provider

LiteLLM Configuration Reference

The LiteLLM proxy config (infrastructure/litellm-proxy/config.yaml) already has extended thinking enabled:

- model_name: anthropic/claude-sonnet-4-5-20250929
  litellm_params:
    model: anthropic/claude-sonnet-4-5-20250929
    api_key: os.environ/ANTHROPIC_API_KEY
    thinking:
      type: enabled
      budget_tokens: 8000

The reasoning_adapter callback has been disabled since it cannot solve the signature validation problem.

References

Postmortem

Timeline

  1. Initial Issue: Multi-turn conversations with Claude + extended thinking failed via LiteLLM
  2. Investigation: Identified that thinking block signatures were being lost in format translation
  3. Attempt 1: Created reasoning_adapter.py callback to cache/restore thinking blocks - failed due to signature validation
  4. Attempt 2: Tried placeholder thinking blocks with fake signatures - failed, signatures are cryptographically verified
  5. Root Cause Identified: The OpenAI-compatible translation path cannot preserve Anthropic's native thinking block format
  6. Solution: Use LiteLLM's /v1/messages endpoint which speaks native Anthropic format

Key Learnings

  1. Anthropic's thinking block signatures are cryptographic and cannot be forged
  2. Any format translation (Anthropic <-> OpenAI) will break extended thinking
  3. LiteLLM provides both OpenAI-compatible AND Anthropic-native endpoints
  4. The Anthropic-native endpoint preserves all features (cost tracking, logging) while avoiding format translation

Files Modified

  • apps/sidecar/src/backends/opencode-backend.ts - Added dual provider configuration
  • infrastructure/litellm-proxy/config.yaml - Disabled reasoning_adapter callback
  • infrastructure/litellm-proxy/reasoning_adapter.py - Attempted fix (now disabled)

Next Steps

  1. Complete the model routing logic in execute() method
  2. Test the integration end-to-end
  3. Consider adding automatic provider selection based on model name
  4. Update any upstream code that specifies the provider to use litellm-anthropic for Claude models
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment