Most teams assume tool selection is an LLM problem.
Better prompt. Better tool descriptions. Better model.
Emcy's biggest improvement happens one layer earlier.
Before the main LLM call, Emcy does semantic preselection. The model does not see the entire agent catalog. It sees a small, relevant subset of tools that already matches the user's intent.
That sounds like a small architectural choice. It is not. It changes the economics, reliability, and operability of the whole system.
NAIVE
user request
-> full agent catalog
-> LLM must search and reason at once
-> more noise, more tokens, less control
EMCY
user request
-> semantic preselection
-> small relevant shortlist
-> LLM reasons over the right options
Why This Matters
The LLM is best at reasoning between plausible options. It is not the best place to do first-pass retrieval over an ever-growing catalog.
When you force the model to scan every tool in the agent, you combine two different jobs in one expensive step:
- retrieval
- reasoning
That can look fine in an early demo. It gets worse as the agent grows.
Semantic preselection separates those jobs. Retrieval happens first. Reasoning happens second. The result is a cleaner prompt, a tighter candidate set, and a model that spends its attention on the user's problem instead of on catalog cleanup.
Why The Naive Approach Is So Common
The naive version is easy to ship:
- Serialize every tool name and description.
- Put the full catalog into the prompt.
- Ask the model to pick.
For ten tools, this feels perfectly reasonable. For a real product, it becomes a trap.
Catalogs do not stay small. They accumulate:
- more endpoints
- more MCP servers
- more teams
- more environments
- more historical surface area
Eventually the model is not choosing between a few strong options. It is filtering noise inside the same context window it needs for the actual task.
Why It Is Better Than Prompting The Full Catalog
This is not just about saving prompt tokens. It changes the quality of the decision.
| What matters | Full-catalog prompt | Emcy preselection |
|---|---|---|
| Context budget | Grows with every tool added | Stays bounded around the shortlist |
| Model job | Search and reasoning mixed together | Reasoning over relevant candidates |
| Behavior as catalogs grow | Noisier and less predictable | More stable and focused |
| Operating model | Hidden inside a prompt | Explicit system layer |
| Scaling cost | Every new tool taxes every request | New tools matter only when relevant |
The important shift is simple:
The model should solve the user's problem, not search a warehouse of tools.
That is why Emcy does preselection before the main call. It improves:
- signal-to-noise ratio
- latency
- token efficiency
- consistency under growth
- confidence that the model is choosing from the right neighborhood
If you want the deeper architecture behind this, we covered that in our technical write-up on scaling tool selection. This post is about the higher-level point: where the improvement belongs.
Why This Is Also An Operations Advantage
Once tool selection becomes an explicit layer, it becomes observable.
That matters more than most teams realize.
If tool selection only exists as "some prompt text we prepend before the call," it is hard to inspect, hard to compare, and hard to improve. When something goes wrong, you do not know whether the issue was:
- poor retrieval
- bad tool descriptions
- too many candidates
- weak ranking
- prompt crowding
- model drift
With semantic preselection, Emcy can treat selection as a system with measurable signals.
How We Monitor And Tune Success
We look at selection as an ongoing feedback loop, not a one-time implementation detail.
| Signal | Why it matters |
|---|---|
| Shortlist size | Tells us whether we are sending too much or too little context |
| Selection latency | Keeps the preselection step fast enough to stay invisible to the user |
| Prompt-size reduction | Shows whether the layer is actually protecting the main model's context window |
| Selection-to-use alignment | Helps us see whether the tools retrieved are the ones the model actually uses |
| Fallback or no-match cases | Surfaces where the shortlist was too weak or too narrow |
| Miss patterns by agent or server | Reveals catalogs that need better descriptions, metadata, or ranking inputs |
Those signals give us practical tuning levers:
- adjust candidate count
- refine similarity thresholds
- improve tool descriptions and metadata
- add better filtering by agent, server, or enabled state
- compare ranking behavior over time instead of guessing
Why This Is Better Than Rolling Your Own
Most dev teams begin with the naive prompt-all-tools version because it is the shortest path to a demo.
Then production happens.
Now they need to build the missing system around it:
- ranking telemetry
- evaluation datasets
- replay tooling
- dashboards
- tuning controls
- rollout safety for selection changes
In other words, they do not just need a better prompt. They need retrieval, observability, and tuning infrastructure.
Emcy ships that layer from day one.
So the value is not only that the model sees fewer, better tools. The value is that the selection system itself is already structured to be measured, improved, and operated as your catalog grows.
The Real Win
Semantic preselection is not just a performance trick.
It is a better boundary.
It puts retrieval before reasoning. It keeps the LLM focused on decision-making instead of catalog scanning. And it gives Emcy a concrete system we can monitor and tune, instead of a prompt hack that gets harder to manage every quarter.
That is why Emcy's tool-selection improvement happens before the LLM call.
If you want agents to stay reliable as your tool catalog grows, Emcy can help.