| name | fuzzy-tool-retrieval |
| description | Use this skill when the user wants examples where the request sounds casual, messy, or business-like rather than technical, such as “connect the dots and pick the right tools,” “make it figure out which app to use,” or “don’t tell it the API names.” Trigger it for requests about vague phrasing, hidden tool choice, distractor tools, or semantically similar tools that should not be selected. Example triggers: “give me tasks where the right tool is implied, not stated,” “test if it can pick the right app from a big toolbox,” “make the instructions fuzzy,” and “I want realistic requests, not API-style prompts.” |
Skill: fuzzy-tool-retrieval
1. Capability Definition & Real Case
- Professional Definition: Fuzzy tool retrieval is the capability to infer the correct worker or tool from an underspecified user request that does not name the API, the server, or the exact execution steps. The challenge is semantic disambiguation under noise: multiple tools may look plausible, but only a small subset actually matches the latent intent and downstream constraints of the task. For an orchestration agent, this is the front door of coordination, because poor worker routing contaminates every later phase.
- Dimension Hierarchy: Tool Invocation Fidelity->Invocation Specification Handling->fuzzy-tool-retrieval
Real Case
Extract 1 to 3 concrete cases from the actual benchmark papers to demonstrate this capability, aiming for a maximum of 3 cases. CRITICAL HANDLING FOR TEMPLATES: If the paper provides concrete examples, extract them directly. However, if the paper ONLY provides abstract templates, you MUST NOT just output the template. Instead, you MUST instantiate the template into a highly specific, realistic, and logically coherent example according to the papers. Based on the papers, invent a specific entity, a specific environment, a concrete trajectory, and a final answer that perfectly perfectly reflects the benchmark's exact intent and difficulty.
Do NOT state that they are extracted from papers or benchmarks, and absolutely do NOT mention or reference the source papers.
[Case 1]
- Initial Environment: A live MCP environment exposes park data, weather data, maps, place details, alerts, visitor-center hours, campgrounds, and local search. The user does not mention any tool names and instead asks for a realistic travel-planning deliverable that implicitly spans several domains. Many additional unrelated tools are also available, so the agent must discover the correct subset before doing any useful work.
- Real Question: Plan a week-long hiking and camping loop that starts and ends in Denver, narrow the candidates by drive time, flag rain risk and active alerts, include visitor-center hours and campgrounds, and add nearby hotel fallbacks with concrete travel details.
- Real Trajectory: The agent interprets the request as a multi-domain routing problem, retrieves park-discovery tools for candidate generation, then uses maps and weather tools rather than unrelated travel or generic search tools, and only later uses place-detail tools for local fallback lodging. It resolves the request into the correct tool family without being told the names of the park, map, or weather APIs.
- Why this demonstrates the capability: This is a canonical fuzzy-retrieval case because the user expresses goals in plain travel language rather than interface language. The coordinator must infer which categories of tools are relevant before it can even begin the workflow. The difficulty comes from semantic ambiguity, not from a hard formula or a long dependency chain.
[Case 2]
- Initial Environment: A benchmark environment contains web-search and fetch tools plus a broad set of unrelated productivity tools. The prompt asks a quantitative question that cannot be answered from a single search result, but it also does not explicitly say “use search” or “use fetch.” The agent must infer that the correct path is evidence retrieval followed by reasoning.
- Real Question: Assuming all research articles in Nature 2020 relied on statistical significance with an average p-value of 0.04, calculate how many papers would incorrectly claim significance, rounding up.
- Real Trajectory: The agent identifies that it first needs a retrieval tool to obtain the relevant count context and then a calculation step to produce the final number, rather than choosing unrelated file, email, or terminal tools. It routes to the search/fetch path because the wording implies external evidence gathering even though no tool name is stated.
- Real Answer: 41
- Why this demonstrates the capability: This case demonstrates fuzzy tool retrieval because the orchestration challenge lies in mapping an ordinary-language question to the correct retrieval-and-computation tool family. The agent must not be fooled by superficially plausible alternatives such as generic note or file tools. The final arithmetic only matters after the coordinator has chosen the right retrieval route.
Pipeline Execution Instructions
To synthesize data for this capability, you must strictly follow a 3-phase pipeline. Do not hallucinate steps. Read the corresponding reference file for each phase sequentially:
-
Phase 1: Environment Exploration
Read the exploration guidelines to discover raw knowledge seeds:
references/EXPLORATION.md
-
Phase 2: Trajectory Selection
Once Phase 1 is complete, read the selection criteria to evaluate the trajectory:
references/SELECTION.md
-
Phase 3: Data Synthesis
Once a trajectory passes Phase 2, read the synthesis instructions to generate the final data:
references/SYNTHESIS.md