Emcy — Ship an AI agent that actually does things

Most teams assume tool selection is an LLM problem.

Better prompt. Better tool descriptions. Better model.

Emcy's biggest improvement happens one layer earlier.

Before the main LLM call, Emcy does semantic preselection. The model does not see the entire agent catalog. It sees a small, relevant subset of tools that already matches the user's intent.

That sounds like a small architectural choice. It is not. It changes the economics, reliability, and operability of the whole system.

NAIVE
user request
  -> full agent catalog
  -> LLM must search and reason at once
  -> more noise, more tokens, less control

EMCY
user request
  -> semantic preselection
  -> small relevant shortlist
  -> LLM reasons over the right options

Why This Matters

The LLM is best at reasoning between plausible options. It is not the best place to do first-pass retrieval over an ever-growing catalog.

When you force the model to scan every tool in the agent, you combine two different jobs in one expensive step:

retrieval
reasoning

That can look fine in an early demo. It gets worse as the agent grows.

Semantic preselection separates those jobs. Retrieval happens first. Reasoning happens second. The result is a cleaner prompt, a tighter candidate set, and a model that spends its attention on the user's problem instead of on catalog cleanup.

Why The Naive Approach Is So Common

The naive version is easy to ship:

Serialize every tool name and description.
Put the full catalog into the prompt.
Ask the model to pick.

For ten tools, this feels perfectly reasonable. For a real product, it becomes a trap.

Catalogs do not stay small. They accumulate:

more endpoints
more MCP servers
more teams
more environments
more historical surface area

Eventually the model is not choosing between a few strong options. It is filtering noise inside the same context window it needs for the actual task.

Why It Is Better Than Prompting The Full Catalog

This is not just about saving prompt tokens. It changes the quality of the decision.

What matters	Full-catalog prompt	Emcy preselection
Context budget	Grows with every tool added	Stays bounded around the shortlist
Model job	Search and reasoning mixed together	Reasoning over relevant candidates
Behavior as catalogs grow	Noisier and less predictable	More stable and focused
Operating model	Hidden inside a prompt	Explicit system layer
Scaling cost	Every new tool taxes every request	New tools matter only when relevant

The important shift is simple:

The model should solve the user's problem, not search a warehouse of tools.

That is why Emcy does preselection before the main call. It improves:

signal-to-noise ratio
latency
token efficiency
consistency under growth
confidence that the model is choosing from the right neighborhood

If you want the deeper architecture behind this, we covered that in our technical write-up on scaling tool selection. This post is about the higher-level point: where the improvement belongs.

Why This Is Also An Operations Advantage

Once tool selection becomes an explicit layer, it becomes observable.

That matters more than most teams realize.

If tool selection only exists as "some prompt text we prepend before the call," it is hard to inspect, hard to compare, and hard to improve. When something goes wrong, you do not know whether the issue was:

poor retrieval
bad tool descriptions
too many candidates
weak ranking
prompt crowding
model drift

With semantic preselection, Emcy can treat selection as a system with measurable signals.

How We Monitor And Tune Success

We look at selection as an ongoing feedback loop, not a one-time implementation detail.

Signal	Why it matters
Shortlist size	Tells us whether we are sending too much or too little context
Selection latency	Keeps the preselection step fast enough to stay invisible to the user
Prompt-size reduction	Shows whether the layer is actually protecting the main model's context window
Selection-to-use alignment	Helps us see whether the tools retrieved are the ones the model actually uses
Fallback or no-match cases	Surfaces where the shortlist was too weak or too narrow
Miss patterns by agent or server	Reveals catalogs that need better descriptions, metadata, or ranking inputs

Those signals give us practical tuning levers:

adjust candidate count
refine similarity thresholds
improve tool descriptions and metadata
add better filtering by agent, server, or enabled state
compare ranking behavior over time instead of guessing

Why This Is Better Than Rolling Your Own

Most dev teams begin with the naive prompt-all-tools version because it is the shortest path to a demo.

Then production happens.

Now they need to build the missing system around it:

ranking telemetry
evaluation datasets
replay tooling
dashboards
tuning controls
rollout safety for selection changes

In other words, they do not just need a better prompt. They need retrieval, observability, and tuning infrastructure.

Emcy ships that layer from day one.

So the value is not only that the model sees fewer, better tools. The value is that the selection system itself is already structured to be measured, improved, and operated as your catalog grows.

The Real Win

Semantic preselection is not just a performance trick.

It is a better boundary.

It puts retrieval before reasoning. It keeps the LLM focused on decision-making instead of catalog scanning. And it gives Emcy a concrete system we can monitor and tune, instead of a prompt hack that gets harder to manage every quarter.

That is why Emcy's tool-selection improvement happens before the LLM call.

If you want agents to stay reliable as your tool catalog grows, Emcy can help.

The Best Tool-Selection Improvement Happens Before the LLM