
Every team building on top of LLMs eventually hits the same wall: the more your agent can do, the worse it gets at doing any one thing.
At Slite, we built Super (now Slite agent), the hub for accurate company answers, for humans and for agents.
Slite agent connects to your team's knowledge wherever it lives, and increasingly it does that through MCP. That's how we ran straight into the wall.
This is the story of how we went from around 10 built-in tools to an effectively unbounded set and why the answer wasn't "add more tools," but "stop loading them all upfront."
In late 2024, Super's toolset was small and hand-picked. A handful of data sources, plus a couple of agent capabilities:
| Tool | Type |
|---|---|
| Slite | Data source |
| Google Drive | Data source |
| Notion | Data source |
| Confluence | Data source |
| Jira | Data source |
| Linear | Data source |
| Search | Semantic search |
| Charts | Visualization |
At that size, the naive approach works fine: hand the model every tool definition upfront, on every request, and let it choose.
Two things changed. We added far more native integrations and then MCP let any team plug in their own tools. The number stopped being a number and became unbounded.
A good problem to have, except the naive approach falls apart at this scale.
We want Slite agent to be able to answer anything, because that's what makes it a daily habit. But every tool you add upfront makes the agent worse on three axes at once:
Concretely, it came down to two hard problems.
Tool definitions eat context before the agent even starts working. This isn't a rounding error:
Everyone shipping agents has hit this ceiling, which is why those caps exist.
We didn't want a cap. We wanted infinity.
This one is subtler and, for us, more expensive. Modern LLM APIs let you cache the prefix of a prompt (the tool definitions, the system prompt, the conversation so far) so repeated turns are cheap and fast.
But the cache is a chain: change anything early in it, and everything after is invalidated.
{
"tools": [
"searchTool",
"readContext",
"chartTool" // ← added mid-conversation
],
"system": "...", // ⚠ cache miss
"messages": [
{ "turn 1..." },
{ "turn 2..." },
{ "turn 3..." } // ⚠ all re-processed
]
}So "just load the tool when you need it" naively means: mutate the tool list mid-conversation → blow away the entire prompt cache → re-process every prior turn.
The thing that should make the agent cheaper makes it dramatically more expensive.
Our first move was to stop loading every source's full toolset upfront, and instead let the agent discover the right tool on demand.
The agent calls a lightweight disambiguation tool:
{
"tool": "disambiguateGithub",
"parameters": {
"dateRange": { "from": "2026-02-23", "to": "2026-03-02" }
}
}…and gets back a full, team-specific tool definition:
{
"name": "githubSearch",
"parameters": {
"properties": {
"owner": { "enum": ["sliteteam"] },
"assignees": { "enum": ["Antoine Duban", "Charley DAVID", "Florian", "..."] },
"labels": { "enum": ["Fix", "Priority-high", "Priority-urgent"] }
}
}
}The enum values are populated dynamically from this team's setup, so the model gets a precise, grounded tool instead of a generic one.
This helped. But it didn't scale, because disambiguation was per-source:
disambiguateSliteTools
disambiguateGoogleDriveTools
disambiguateNotionTools
disambiguateConfluenceTools
disambiguateGitHubTools
disambiguateLinearTools
disambiguateJiraTools
disambiguateIntercomTools
disambiguateSalesforceTools
disambiguateBigQueryTools
...One disambiguation tool per source, so the upfront list just kept growing again.
We'd moved the problem, not solved it.
Then it clicked. The disambiguation pattern (call a tool, get back another tool) was already a one-step special case of something we were doing elsewhere.
It looked exactly like the way the agent queries a database:
findTables() → inspectTable() → queryTable()Each step reveals the next. That's progressive loading. We just needed to generalize it from a one-step special case into a proper, multi-level structure.
We replaced the flat list of N disambiguation tools with a small number of themed entrypoints.
Each node in the tree loads its children on demand — just in time — and a single call can return everything the agent needs for a task in one round trip:
getDevAndProjectTools(
for: "last month",
kinds: git, github, linear
)That one call returns all the relevant tool definitions, scoped to the team's sources and timeframe, replacing what used to be many separate disambiguation tools.
The full tree looks like this:
progressiveSearch
├── readContext
├── expandDocument
└── getMoreResults
devAndProjectTools
├── github
│ ├── search ──→ getMore
│ └── aggregate ──→ chart ✦
├── linear
│ ├── search ──→ getMore
│ └── aggregate ──→ chart ✦
└── jira, gitlab, asana…
codeExplorer
├── listRepos
├── listDirectory
├── findFiles
├── readFile
└── grep
bigquery
├── findTables
├── inspectTables
└── queryTables ──→ chart ✦
crmTools
├── attio
│ ├── search ──→ getMore
│ └── aggregate ──→ chart ✦
└── hubspot, salesforce…The agent only ever sees the entrypoints upfront.
It walks deeper into a branch only when the task requires it and tools like chart (the ✦ nodes) are injected the moment they become relevant, not before.
Here's a real path through the tree with the agent answering "I need an MRR chart" against BigQuery.
Each step dynamically loads the next tool:
findTables - discover what tables exist in the dataset.inspectTable - check the schema of the most relevant table.queryTable - run the SQL, get the grouped data back.chartTool ← dynamically injected - a grouping query is detected, so chartTool is loaded for the first time and the result gets visualized.At no point was the full menu of dozens of tools sitting in context. The agent loaded five definitions, in order, exactly when each was useful.
We benchmarked the unified agent against our old v2 engine on 63 complex questions (an automated benchmark):
Cheaper, faster, and able to reach far more tools than before — at the same time.
The context budget and cache-invalidation problems both came from the same root cause: loading everything upfront.
Fix the root cause and both problems go away together.
A fair question, since Claude now exposes a tool-search feature (behind beta headers) that helps a model discover tools without loading every definition upfront.
It's genuinely useful and we build on top of it.
We wrap the deferred-tools mechanism to fully control dynamic tool loading, rather than handing the model everything upfront. It's one piece of the puzzle that helped us solve the cache-invalidation problem, but you need real orchestration built around it to get the rest of the way.
We're not claiming this is the final answer. It's where we are today, and the ground keeps moving.
Our next problem is already in view: managing context memory — what gets stored where, and the retrieval skills the agent needs to use it well.
The direction we're most excited about is going from many tools and MCPs to fewer, smarter sub-agents. One clean way to generalize this optimization across every provider: expose fewer tools, and let purpose-built agents manage their own internal complexity.
Take editing a Slite document over MCP. Today it's N separate operations the orchestrator has to sequence correctly:
append_block(...)
modify_block(...)
remove_block(...)
get_note(...)
update_note(...)Tomorrow, it's one agent that owns that complexity:
edit_document("Update the Q1 section with the latest metrics")The orchestrator stops juggling block-level primitives and starts delegating intent. Fewer tools at the top, more capability underneath.
Building agents and hit the same wall? I'd love to compare notes!

Escrito por Christophe Pasquier
Chris founded Slite in 2017 and has spent the decade since thinking about how teams actually keep track of what they know. He writes about where the category is going next — agentic knowledge management, context graphs, and the parts of knowledge work AI is quietly rewriting. He's been wrong about the future before. Mostly he's been early. Find him @Christophepas on Twitter!