What is the Claude Advisor Tool?

The Advisor Tool is a beta feature in the Claude API that lets a faster executor model (Sonnet or Haiku) consult a higher-intelligence advisor model (Opus) mid-generation. The advisor reads the full conversation, produces a plan or correction in 400 to 700 tokens, and the executor continues with the task. It runs inside a single /v1/messages request with no extra round trips.

How much does the Claude Advisor Tool cost?

Advisor calls run as a separate sub-inference billed at the advisor model rates. The executor tokens are billed at the executor rate. Because the advisor produces 400 to 700 tokens of guidance instead of the full output, most token generation happens at the cheaper executor rate. Pairing Sonnet as executor with Opus as advisor delivers near-Opus quality at similar or lower total cost than running Opus alone.

Which models work with the Advisor Tool?

The advisor must be at least as capable as the executor. Valid pairs: Haiku 4.5 with Opus 4.6, Sonnet 4.6 with Opus 4.6, and Opus 4.6 with Opus 4.6. Invalid pairs return a 400 error.

When should I not use the Advisor Tool?

The advisor adds minimal value for single-turn Q&A where there is nothing to plan, pure model-picker UIs where users choose their own cost and quality tradeoff, or workloads where every turn requires the full capability of the advisor model. It shines on long-horizon agentic workloads: coding agents, multi-step research, and CI pipelines.

guide

Claude Advisor Tool in 2026: Opus quality, Sonnet cost

Apr 10, 2026 | 8 min read

AI brain visualization with neural network connections representing dual-model collaboration — Two models, one request: the executor runs fast while the advisor thinks deep. Photo by Andrea De Santis on Unsplash

You have a coding agent running Sonnet. It handles 90% of turns without breaking a sweat: reading files, running tests, writing boilerplate. But when it hits a gnarly architecture decision or a subtle concurrency bug, you wish it could phone a friend.

That's the Advisor Tool. Anthropic's new beta API feature lets a faster executor model (Sonnet or Haiku) call a higher-intelligence advisor model (Opus) mid-generation. The advisor reads the full transcript, produces a short plan or course correction, and the executor continues with the task. One API request, two models, near-Opus quality at Sonnet pricing.

How the Advisor Tool works

When you add the advisor tool to your tools array, the executor decides when to call it, like any other tool. The flow:

The executor emits a server_tool_use block with name: "advisor" and an empty input.
Anthropic runs a separate inference pass on the advisor model server-side, passing the executor's full transcript (system prompt, tool definitions, all prior turns and results).
The advisor's response returns as an advisor_tool_result block (typically 400 to 700 text tokens).
The executor continues generating, informed by the advice.

All of this happens inside a single /v1/messages request. No extra round trips on your side. The advisor runs without tools and without context management; its thinking blocks are dropped and only the advice text reaches the executor.

Your first advisor call: curl, Python, and TypeScript

The advisor tool is in beta. Include the advisor-tool-2026-03-01 beta header in your requests. Here's the simplest possible call:

curl

curl https://api.anthropic.com/v1/messages \
  --header "x-api-key: $ANTHROPIC_API_KEY" \
  --header "anthropic-version: 2023-06-01" \
  --header "anthropic-beta: advisor-tool-2026-03-01" \
  --header "content-type: application/json" \
  --data '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 4096,
    "tools": [
      {
        "type": "advisor_20260301",
        "name": "advisor",
        "model": "claude-opus-4-6"
      }
    ],
    "messages": [{
      "role": "user",
      "content": "Build a concurrent worker pool in Go with graceful shutdown."
    }]
  }'

Python

import anthropic

client = anthropic.Anthropic()

response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    betas=["advisor-tool-2026-03-01"],
    tools=[
        {
            "type": "advisor_20260301",
            "name": "advisor",
            "model": "claude-opus-4-6",
        }
    ],
    messages=[
        {
            "role": "user",
            "content": "Build a concurrent worker pool in Go with graceful shutdown.",
        }
    ],
)

print(response)

TypeScript

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.beta.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 4096,
  betas: ["advisor-tool-2026-03-01"],
  tools: [
    {
      type: "advisor_20260301",
      name: "advisor",
      model: "claude-opus-4-6",
    },
  ],
  messages: [
    {
      role: "user",
      content: "Build a concurrent worker pool in Go with graceful shutdown.",
    },
  ],
});

console.log(response);

What the response looks like

A successful advisor call produces four content blocks: the executor's initial text, the server_tool_use block, the advisor_tool_result block, and the executor's final output informed by the advice.

{
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Let me consult the advisor on this."
    },
    {
      "type": "server_tool_use",
      "id": "srvtoolu_abc123",
      "name": "advisor",
      "input": {}
    },
    {
      "type": "advisor_tool_result",
      "tool_use_id": "srvtoolu_abc123",
      "content": {
        "type": "advisor_result",
        "text": "Use a channel-based coordination pattern. Close the input channel first, then wait on a WaitGroup..."
      }
    },
    {
      "type": "text",
      "text": "Here's the implementation using a channel-based coordination pattern..."
    }
  ]
}

The advisor_tool_result content has two variants: advisor_result with plaintext advice, and advisor_redacted_result with encrypted content. In both cases, round-trip the content verbatim on subsequent turns.

Valid model pairs

The advisor must be at least as capable as the executor. Invalid pairs return a 400 error.

Executor	Advisor
Claude Haiku 4.5	Claude Opus 4.6
Claude Sonnet 4.6	Claude Opus 4.6
Claude Opus 4.6	Claude Opus 4.6

The sweet spot for most workloads: Sonnet as executor, Opus as advisor. You get a quality lift at similar or lower total cost compared to running Opus for every token.

Multi-turn conversations

Pass the full assistant content, including advisor_tool_result blocks, back to the API on subsequent turns. If you omit the advisor tool from tools on a follow-up turn while the message history still contains advisor_tool_result blocks, the API returns a 400.

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "type": "advisor_20260301",
        "name": "advisor",
        "model": "claude-opus-4-6",
    }
]

messages = [
    {
        "role": "user",
        "content": "Build a concurrent worker pool in Go with graceful shutdown.",
    }
]

# First turn: executor calls advisor, builds the worker pool
response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    betas=["advisor-tool-2026-03-01"],
    tools=tools,
    messages=messages,
)

# Pass back the full response content (including advisor_tool_result blocks)
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": "Now add a max-in-flight limit of 10."})

# Second turn: executor has context from first advisor call
response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    betas=["advisor-tool-2026-03-01"],
    tools=tools,
    messages=messages,
)

Prompt engineering for coding agents

The advisor tool ships with a built-in description that nudges the executor to call it near the start of complex tasks. For coding and agent workloads, you can improve results with a system prompt that reinforces two timings:

An early first advisor call, after a few exploratory reads are in the transcript
A final advisor call after file writes and test outputs are in the transcript

Here's the system prompt pattern Anthropic recommends for coding tasks. It produced the highest intelligence at near-Sonnet cost in internal evaluations:

You have access to an `advisor` tool backed by a stronger reviewer model.
It takes NO parameters. When you call advisor(), your entire conversation
history is automatically forwarded.

Call advisor BEFORE substantive work: before writing, before committing
to an interpretation, before building on an assumption.

Also call advisor:
- When you believe the task is complete (save your deliverable first)
- When stuck: errors recurring, approach not converging
- When considering a change of approach

The advisor should respond in under 100 words and use enumerated steps,
not explanations.

Trim output tokens by 35-45%: Adding "The advisor should respond in under 100 words and use enumerated steps, not explanations" to your system prompt cuts advisor output without changing call frequency. Pair it with the timing block for the strongest cost-versus-quality tradeoff.

Combining with other tools

The advisor tool composes with web search, code execution, and your custom tools in the same tools array. The executor can search the web, call the advisor, and use your tools in the same turn. The advisor's plan can inform which tools the executor reaches for next.

tools = [
    {
        "type": "web_search_20250305",
        "name": "web_search",
        "max_uses": 5,
    },
    {
        "type": "advisor_20260301",
        "name": "advisor",
        "model": "claude-opus-4-6",
    },
    {
        "name": "run_bash",
        "description": "Run a bash command",
        "input_schema": {
            "type": "object",
            "properties": {"command": {"type": "string"}},
        },
    },
]

Advisor prompt caching

Two independent caching layers are available. Executor-side caching works the same as any content block: place a cache_control breakpoint after an advisor_tool_result and it hits.

Advisor-side caching keeps the advisor's transcript cached across calls within the same conversation. Enable it with a caching field on the tool definition:

tools = [
    {
        "type": "advisor_20260301",
        "name": "advisor",
        "model": "claude-opus-4-6",
        "caching": {"type": "ephemeral", "ttl": "5m"},
    }
]

The cache write costs more than the reads save when the advisor is called two or fewer times. Caching breaks even at roughly three advisor calls and improves from there. Enable it for long agent loops; keep it off for short tasks.

Usage and billing breakdown

Advisor calls run as a separate sub-inference billed at the advisor model's rates. The usage.iterations array gives you a per-iteration breakdown:

{
  "usage": {
    "input_tokens": 412,
    "cache_read_input_tokens": 0,
    "cache_creation_input_tokens": 0,
    "output_tokens": 531,
    "iterations": [
      {
        "type": "message",
        "input_tokens": 412,
        "output_tokens": 89
      },
      {
        "type": "advisor_message",
        "model": "claude-opus-4-6",
        "input_tokens": 823,
        "output_tokens": 1612
      },
      {
        "type": "message",
        "input_tokens": 1348,
        "cache_read_input_tokens": 412,
        "output_tokens": 442
      }
    ]
  }
}

Top-level usage fields reflect executor tokens only. Iterations with type: "advisor_message" are billed at the advisor model's rates. Use the iterations array when building cost-tracking logic.

Cost control: capping advisor calls

The advisor tool has no built-in conversation-level cap. Use max_uses on the tool definition for per-request limits. For conversation-level limits, count calls client-side and strip the advisor when you hit your ceiling:

# Track advisor calls client-side
advisor_count = 0
MAX_ADVISOR_CALLS = 5

for turn in conversation:
    response = client.beta.messages.create(...)

    # Count advisor calls in response
    for block in response.content:
        if block.type == "server_tool_use" and block.name == "advisor":
            advisor_count += 1

    if advisor_count >= MAX_ADVISOR_CALLS:
        # Remove advisor tool and strip advisor_tool_result blocks
        tools = [t for t in tools if t.get("name") != "advisor"]
        for msg in messages:
            if msg["role"] == "assistant":
                msg["content"] = [
                    b for b in msg["content"]
                    if b.get("type") not in ("server_tool_use", "advisor_tool_result")
                    or b.get("name") != "advisor"
                ]

Error handling

If the advisor call fails, the result carries an advisor_tool_result_error with an error_code. The executor sees the error and continues without advice; the request itself does not fail.

Error code	Meaning
`max_uses_exceeded`	Request reached the `max_uses` cap on the tool definition
`too_many_requests`	Advisor sub-inference was rate-limited
`overloaded`	Advisor hit capacity limits
`prompt_too_long`	Transcript exceeded the advisor model's context window
`execution_time_exceeded`	Advisor sub-inference timed out

Streaming behavior

The advisor sub-inference does not stream. The executor's stream pauses while the advisor runs, then the full advisor_tool_result arrives in a single content_block_start event. SSE ping keepalives fire every 30 seconds during the pause. Plan for 2 to 5 seconds of silence per advisor call, depending on transcript length.

When the advisor helps (and when it doesn't)

Good fit	Weak fit
Coding agents with multi-step file edits	Single-turn Q&A
Multi-step research pipelines	Model-picker UIs where users choose quality
Computer use agents with branching decisions	Workloads where every turn needs full Opus
CI/CD pipelines with complex test analysis	Short, reactive tasks dictated by tool output

Effort pairing tip: For coding tasks, pair a Sonnet executor at medium effort with an Opus advisor. This achieves intelligence comparable to Sonnet at default effort, at lower cost. For maximum intelligence, keep the executor at default effort.

Limitations to know

Advisor output does not stream. Expect a pause during sub-inference.
No built-in conversation-level cap on advisor calls. Track and cap them client-side.
max_tokens applies to executor output only. It does not bound advisor tokens.
Priority Tier on the executor does not extend to the advisor; you need it on both models.
The feature is in beta. Include anthropic-beta: advisor-tool-2026-03-01 in every request.

Frequently asked questions

What is the Claude Advisor Tool?: The Advisor Tool is a beta feature in the Claude API that lets a faster executor model (Sonnet or Haiku) consult a higher-intelligence advisor model (Opus) mid-generation. The advisor reads the full conversation, produces a plan or correction in 400 to 700 tokens, and the executor continues with the task. It runs inside a single /v1/messages request with no extra round trips.
How much does the Claude Advisor Tool cost?: Advisor calls run as a separate sub-inference billed at the advisor model rates. The executor tokens are billed at the executor rate. Because the advisor produces 400 to 700 tokens of guidance instead of the full output, most token generation happens at the cheaper executor rate. Pairing Sonnet as executor with Opus as advisor delivers near-Opus quality at similar or lower total cost than running Opus alone.
Which models work with the Advisor Tool?: The advisor must be at least as capable as the executor. Valid pairs: Haiku 4.5 with Opus 4.6, Sonnet 4.6 with Opus 4.6, and Opus 4.6 with Opus 4.6. Invalid pairs return a 400 error.
Does the Advisor Tool support streaming?: The executor stream pauses while the advisor runs its sub-inference. When the advisor finishes, the full advisor_tool_result arrives in a single content_block_start event, and executor output resumes streaming. SSE ping keepalives are sent during the pause.
When should I not use the Advisor Tool?: The advisor adds minimal value for single-turn Q&A where there is nothing to plan, pure model-picker UIs where users choose their own cost and quality tradeoff, or workloads where every turn requires the full capability of the advisor model. It shines on long-horizon agentic workloads: coding agents, multi-step research, and CI pipelines.

Try this API

IP Geolocation API — interactive playground and code examples

Start building with botoi

150+ API endpoints for lookup, text processing, image generation, and developer utilities. Free tier, no credit card.

View API docs Try free tools

Claude Advisor Tool in 2026: Opus quality, Sonnet cost

How the Advisor Tool works

Your first advisor call: curl, Python, and TypeScript

curl

Python

TypeScript

What the response looks like

Valid model pairs

Multi-turn conversations

Prompt engineering for coding agents

Combining with other tools

Advisor prompt caching

Usage and billing breakdown

Cost control: capping advisor calls

Error handling

Streaming behavior

When the advisor helps (and when it doesn't)

Limitations to know

Frequently asked questions

More guide posts

66% of firms had an AI agent incident: 4 API checks to add this week

Cloudflare Code Mode MCP: stop paying 1M tokens to describe your tools

LiteLLM got backdoored: audit your AI toolchain this week

Start building with botoi