engineering2026-03-126 min read

The Best Tool-Selection Improvement Happens Before the LLM

Emcy improves tool selection before the model ever reasons: semantic preselection narrows the agent to a small, relevant tool set instead of sending the full catalog into the prompt. That makes agents faster, cheaper, more reliable, and easier to tune.

E

Emcy Team

Engineering

Most teams assume tool selection is an LLM problem.

Better prompt. Better tool descriptions. Better model.

Emcy's biggest improvement happens one layer earlier.

Before the main LLM call, Emcy does semantic preselection. The model does not see the entire agent catalog. It sees a small, relevant subset of tools that already matches the user's intent.

That sounds like a small architectural choice. It is not. It changes the economics, reliability, and operability of the whole system.

NAIVE
user request
  -> full agent catalog
  -> LLM must search and reason at once
  -> more noise, more tokens, less control

EMCY
user request
  -> semantic preselection
  -> small relevant shortlist
  -> LLM reasons over the right options

Why This Matters

The LLM is best at reasoning between plausible options. It is not the best place to do first-pass retrieval over an ever-growing catalog.

When you force the model to scan every tool in the agent, you combine two different jobs in one expensive step:

  • retrieval
  • reasoning

That can look fine in an early demo. It gets worse as the agent grows.

Semantic preselection separates those jobs. Retrieval happens first. Reasoning happens second. The result is a cleaner prompt, a tighter candidate set, and a model that spends its attention on the user's problem instead of on catalog cleanup.

Why The Naive Approach Is So Common

The naive version is easy to ship:

  1. Serialize every tool name and description.
  2. Put the full catalog into the prompt.
  3. Ask the model to pick.

For ten tools, this feels perfectly reasonable. For a real product, it becomes a trap.

Catalogs do not stay small. They accumulate:

  • more endpoints
  • more MCP servers
  • more teams
  • more environments
  • more historical surface area

Eventually the model is not choosing between a few strong options. It is filtering noise inside the same context window it needs for the actual task.

Why It Is Better Than Prompting The Full Catalog

This is not just about saving prompt tokens. It changes the quality of the decision.

What mattersFull-catalog promptEmcy preselection
Context budgetGrows with every tool addedStays bounded around the shortlist
Model jobSearch and reasoning mixed togetherReasoning over relevant candidates
Behavior as catalogs growNoisier and less predictableMore stable and focused
Operating modelHidden inside a promptExplicit system layer
Scaling costEvery new tool taxes every requestNew tools matter only when relevant

The important shift is simple:

The model should solve the user's problem, not search a warehouse of tools.

That is why Emcy does preselection before the main call. It improves:

  • signal-to-noise ratio
  • latency
  • token efficiency
  • consistency under growth
  • confidence that the model is choosing from the right neighborhood

If you want the deeper architecture behind this, we covered that in our technical write-up on scaling tool selection. This post is about the higher-level point: where the improvement belongs.

Why This Is Also An Operations Advantage

Once tool selection becomes an explicit layer, it becomes observable.

That matters more than most teams realize.

If tool selection only exists as "some prompt text we prepend before the call," it is hard to inspect, hard to compare, and hard to improve. When something goes wrong, you do not know whether the issue was:

  • poor retrieval
  • bad tool descriptions
  • too many candidates
  • weak ranking
  • prompt crowding
  • model drift

With semantic preselection, Emcy can treat selection as a system with measurable signals.

How We Monitor And Tune Success

We look at selection as an ongoing feedback loop, not a one-time implementation detail.

SignalWhy it matters
Shortlist sizeTells us whether we are sending too much or too little context
Selection latencyKeeps the preselection step fast enough to stay invisible to the user
Prompt-size reductionShows whether the layer is actually protecting the main model's context window
Selection-to-use alignmentHelps us see whether the tools retrieved are the ones the model actually uses
Fallback or no-match casesSurfaces where the shortlist was too weak or too narrow
Miss patterns by agent or serverReveals catalogs that need better descriptions, metadata, or ranking inputs

Those signals give us practical tuning levers:

  • adjust candidate count
  • refine similarity thresholds
  • improve tool descriptions and metadata
  • add better filtering by agent, server, or enabled state
  • compare ranking behavior over time instead of guessing

Why This Is Better Than Rolling Your Own

Most dev teams begin with the naive prompt-all-tools version because it is the shortest path to a demo.

Then production happens.

Now they need to build the missing system around it:

  • ranking telemetry
  • evaluation datasets
  • replay tooling
  • dashboards
  • tuning controls
  • rollout safety for selection changes

In other words, they do not just need a better prompt. They need retrieval, observability, and tuning infrastructure.

Emcy ships that layer from day one.

So the value is not only that the model sees fewer, better tools. The value is that the selection system itself is already structured to be measured, improved, and operated as your catalog grows.

The Real Win

Semantic preselection is not just a performance trick.

It is a better boundary.

It puts retrieval before reasoning. It keeps the LLM focused on decision-making instead of catalog scanning. And it gives Emcy a concrete system we can monitor and tune, instead of a prompt hack that gets harder to manage every quarter.

That is why Emcy's tool-selection improvement happens before the LLM call.


If you want agents to stay reliable as your tool catalog grows, Emcy can help.

Tags
MCP
tool selection
semantic search
AI infrastructure
observability
agent reliability