Beyond the Fixed Toolkit: DeepAgent’s Dynamic Discovery

Imagine a truly intelligent assistant. Not one that simply executes predefined commands, but one that can genuinely think, discover the right tools for an unfamiliar job, and then skillfully use them, all while remembering critical details from a sprawling, complex task. For years, the promise of such an AI agent has danced just out of reach, often hobbled by systems that require us to anticipate every single tool it might need, or that simply forget what they were doing midway through a complex project.
Most AI agent frameworks, for all their advancements, still operate on a somewhat rigid “Reason, Act, Observe” loop. This approach works well for straightforward tasks where the agent’s toolkit is neatly handed to it. But what happens when the required tools are vast, unknown, or change dynamically? What if the task stretches over hours, days, or even weeks, demanding shifts in strategy and an ever-expanding mental notebook? This is precisely where most systems hit a wall, turning an “intelligent” agent into a digital tool-follower.
Enter DeepAgent. Researchers from Renmin University of China and Xiaohongshu have unveiled an end-to-end deep reasoning agent that’s designed to overcome these fundamental limitations. DeepAgent isn’t just another incremental update; it represents a significant leap forward by unifying autonomous thinking, on-demand tool discovery, and action execution within a single, coherent reasoning process. It’s an AI that doesn’t just perform; it reasons, adapts, and learns in a way that feels genuinely more human.
Beyond the Fixed Toolkit: DeepAgent’s Dynamic Discovery
One of the most striking aspects of DeepAgent is its ability to break free from the shackles of predefined tool lists. Traditional agents are often limited to the tools “injected” into their initial prompt, a bit like handing a mechanic a single wrench and expecting them to fix any car problem. This works for simple tasks, but it quickly crumbles when facing the vast, ever-changing landscape of real-world problems.
DeepAgent elegantly sidesteps this constraint by integrating tool search and discovery directly into its reasoning process. The model can output four distinct action types: internal thought, tool search, tool call, and memory fold. When it decides a tool is needed, but not immediately available, it doesn’t wait for human intervention. Instead, it emits a “tool search” token and queries a dense index containing descriptions from massive registries – we’re talking over 16,000 RapidAPI tools and nearly 4,000 ToolHop tools. Imagine an apprentice who doesn’t just know the tools you hand them, but can intuitively find *any* tool in a massive, dynamically changing workshop.
This on-demand retrieval means DeepAgent isn’t constrained by what was front-loaded in its prompt. It receives only the top-ranked, most relevant tools back in its context, making tool access dynamic and aligned with the fluid nature of real-world environments. Tools can change, new APIs can emerge, and DeepAgent can adapt without a complete system overhaul. It’s a fundamental shift from static, pre-configured intelligence to adaptable, context-aware problem-solving.
The Art of Remembering and Learning: Memory Folding and ToolPO
Any seasoned technologist knows that one of the biggest headaches for AI agents tackling long, complex tasks is context overflow. A long sequence of tool calls, web results, and code responses will inevitably exceed the finite context window of even the largest language models. It’s like trying to remember every single detail of a year-long project without ever summarizing your notes – eventually, your brain (or the LLM’s context) just gets overwhelmed.
Autonomous Memory Folding: Taming Context Overload
DeepAgent tackles this head-on with an ingenious mechanism called autonomous memory folding. When the model determines its context is getting unwieldy, it emits a “fold” token. An auxiliary LLM then kicks in, compressing the full interaction history into three critical memory types:
- Episodic Memory: Records major task events and milestones.
- Working Memory: Keeps track of the current sub-goal and any recent issues or observations.
- Tool Memory: Stores details of tools used, their arguments, and their outcomes.
These structured memories are then fed back into the agent as compact, information-rich text. This allows DeepAgent to continue its reasoning from a concise yet comprehensive state, preventing context overflow and ensuring stability during incredibly long-horizon tasks. It’s like an expert craftsman who not only remembers every step of a complex project but also continuously refines their understanding and summaries of key facts, always staying focused without losing sight of the bigger picture.
Tool Policy Optimization (ToolPO): Beyond Supervised Learning
But having the right tools and remembering the context isn’t enough; the agent needs to *learn* how to use them effectively and, crucially, *when* to use them. Supervised learning, while valuable, often falls short here. Correct tool calls might only be a few tokens within a vast generated sequence, making it difficult to teach robust tool use directly.
To address this, DeepAgent introduces Tool Policy Optimization (ToolPO), a novel reinforcement learning (RL) approach designed specifically for tool use. ToolPO runs “rollouts” on LLM-simulated APIs, which provides a stable and inexpensive training environment. What’s brilliant is that it attributes reward directly to the *exact tool call tokens* – a concept known as tool call advantage attribution. This precision helps the agent understand the causal link between its decision to call a tool and the subsequent success or failure. By training with a clipped PPO-style objective, DeepAgent learns not only *how* to call tools correctly but also *when* to initiate a tool search and *when* to fold its memory, creating a truly end-to-end learning process for complex interactions.
Putting It to the Test: DeepAgent’s Impressive Performance
The proof, as they say, is in the pudding. The research team rigorously evaluated DeepAgent across a wide array of benchmarks, including 5 general tool-use datasets (ToolBench, API Bank, TMDB, Spotify, ToolHop) and 4 downstream environments (ALFWorld, WebShop, GAIA, HLE). The results speak volumes.
In the “labeled tool” setting, where every method is given the exact tools it needs, DeepAgent 32B RL (powered by a QwQ 32B backbone) consistently outperformed competitors. For instance, it achieved 69.0 on ToolBench and 75.3 on API Bank, emerging as the strongest 32B level result across all five datasets. While workflow baselines like ReAct or CodeAct might show strong performance on individual datasets, DeepAgent demonstrated remarkable uniformity, proving its robustness across diverse challenges.
However, the real test of a dynamic agent lies in the “open set retrieval” setting – the most realistic scenario where the agent must first *find* the right tools before it can use them. Here, DeepAgent truly shined, reaching 64.0 on ToolBench and 40.6 on ToolHop, consistently holding a lead over even the strongest workflow baselines. This confirms that DeepAgent’s architecture and training are perfectly matched for navigating and exploiting large, unknown toolsets. In the grand arena of AI capabilities, consistency and adaptability are often more telling than a single, dazzling feat.
On longer, noisier downstream tasks like ALFWorld, WebShop, GAIA, and HLE, DeepAgent continued to impress. With a 91.8% success rate on ALFWorld and strong showings on WebShop and GAIA, it generally achieved higher scores than workflow agents. This superior performance on complex, extended tasks further validates the synergy between its autonomous memory folding and its robust ToolPO training, which are critical for maintaining coherence and effectiveness over time.
The Path Forward: Truly Autonomous AI Agents
DeepAgent marks a genuinely practical and exciting step towards the next generation of AI agents. By unifying autonomous thinking, on-demand tool discovery across massive registries (like 16,000+ RapidAPIs and 3,900+ ToolHop tools), structured tool calling, and intelligent memory management all within a single reasoning loop, it liberates agents from the constraints of fixed prompts and limited toolsets. The engineering choice to use LLM-simulated APIs in ToolPO is particularly clever, solving the latency and instability issues that have plagued prior tool-using agents and making robust training feasible.
The consistent gains across diverse benchmarks, in both labeled and open-set scenarios, indicate that DeepAgent isn’t just a fleeting peak but a foundational improvement. It makes large tool spaces not just theoretically accessible, but practically usable for LLM agents. We’re seeing the emergence of end-to-end tool agents, equipped with sophisticated memory and reinforcement learning capabilities, solidifying their position as the default pattern for future AI systems that can truly reason, adapt, and act with autonomy.




