Tool selection in embedded agents

Emcy does not ask the model to scan your full tool catalog on every turn.

Instead, embedded tool use is guided by a few layers that work together:

the agent decides which MCP servers and enabled tools are even eligible
Emcy narrows that set with semantic preselection
the model reasons over the smaller shortlist plus any browser-side clientTools you exposed

That separation is the value add.

It keeps prompt size bounded, makes tool choice more reliable as catalogs grow, and gives you real tuning levers instead of hoping a bigger prompt will stay stable.

What actually guides selection#

Tool selection quality comes from the full configuration surface, not one magic prompt.

1. Agent boundaries#

The agent is the first filter.

If a server is not attached to the agent, or a tool is disabled, it cannot be selected.

This is the cleanest way to reduce noise.

2. Tool metadata#

For MCP tools generated from OpenAPI, the best signals are:

operationId
summary and description
HTTP method
path segments
parameter names

Those fields give Emcy better semantic signal before the main model turn.

If two tools look nearly identical, selection gets worse no matter how good the model is.

3. Semantic preselection#

Before the main LLM call, Emcy narrows the agent catalog to a smaller relevant set.

That means the model is not mixing retrieval and reasoning in the same step.

Instead of scanning hundreds of tools, it sees a shortlist that already matches the user's intent.

This improves:

latency
prompt size
consistency
observability

4. Agent prompt#

The agent System Prompt does not replace tool metadata, but it does guide behavior.

Use it for policy and sequencing rules such as:

when the agent should ask before destructive actions
which workflows require confirmation
which client tool should run after a server mutation
how the final answer should summarize what happened

Good prompt example:

TEXT

If you create, delete, or update a todo through MCP, call refreshTodoData before you answer so the host page reflects the latest state.

That kind of instruction helps the agent choose the right next step after it has already found the right tools.

5. Per-turn host context#

The Agent SDK context field is useful when the host app wants to add lightweight runtime guidance.

In the Todo sample, the host app passes:

TSX

<EmcyChat
  context={{
    hostRefreshInstruction:
      "After any server-side todo mutation, call the refreshTodoData client tool before you answer so the host page reflects the latest data.",
  }}
/>

Use context for dynamic hints tied to the current screen or workflow.

Use the agent prompt for steady rules that should apply to every conversation.

6. Browser-side `clientTools`#

clientTools are also part of the available action set.

They are not discovered from OpenAPI. You define them explicitly in the host app, so their names, descriptions, and parameter shapes matter just as much.

Examples:

refreshTodoData
fillQuickAddForm
navigateToInvoice
selectCustomerTab

If a tool name is vague, the model has less chance of using it correctly.

Why this is better than dumping the full catalog into the prompt#

The prompt-all-tools approach looks simple at first, but it scales badly.

As more endpoints and more MCP servers get added, every request pays for that growth even when most tools are irrelevant.

Emcy's approach keeps the model focused on the user's task instead of on first-pass search over a warehouse of tools.

That is why the improvement happens before the main reasoning step.

Practical tuning advice#

If you want better tool selection, start here:

keep each agent tight around one product surface or job to be done
write clear operationId values and endpoint summaries
disable tools that should not be callable yet
keep clientTools small and specific
use the agent prompt for ordering and safety rules
use context for runtime hints that depend on the current page

For the deeper architecture behind semantic preselection, see The Best Tool-Selection Improvement Happens Before the LLM and Token-Efficient Tool Selection: How We Scaled From 10 Tools to 300.