Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

$pwd:

create-eval

Name: Create Eval
Author: dlt-hub

// Create trigger evaluation setup for a toolkit skill. Use when the user wants to test whether a skill's description triggers correctly, set up eval workspaces, or generate trigger test queries for a skill. Use when user says 'create eval', 'test triggers', 'eval skill', or wants to measure skill triggering accuracy.

Exécuter dans Manus

$ git log --oneline --stat

stars:41

forks:3

updated:22 mai 2026 à 20:33

SKILL.md

readonly

name	create-eval
description	Create trigger evaluation setup for a toolkit skill. Use when the user wants to test whether a skill's description triggers correctly, set up eval workspaces, or generate trigger test queries for a skill. Use when user says 'create eval', 'test triggers', 'eval skill', or wants to measure skill triggering accuracy.
argument-hint	[toolkit] [skill]

Create trigger eval for a skill

Scaffold a trigger eval for $ARGUMENTS (format: toolkit skill or toolkit/skill).

Step 1: Locate the skill

Parse $ARGUMENTS into toolkit and skill name. Find the skill at workbench/<toolkit>/skills/<skill>/SKILL.md. Read the skill's frontmatter (name, description) and body to understand what it does and when it should trigger.

Step 2: Create eval directory

Create evals/<toolkit>/<skill>/ if it doesn't exist.

Step 3: Determine eval workspaces

Ask the user which workspace configurations to test. Each workspace represents a different set of installed toolkits — this tests how the skill behaves when competing with other skills.

Common patterns:

init-only — just dlthub ai init (minimum skills: setup-secrets, toolkit-dispatch). Tests cold-start triggering.
with-<toolkit> — init + the skill's own toolkit installed. Tests triggering with competing sibling skills.

Write config.json:

{
  ".eval-workspaces": {
    "init-only": {"toolkits": []},
    "with-rest-api": {"toolkits": ["rest-api-pipeline"]}
  }
}

Ask the user if they want additional workspace configurations. Each entry adds a workspace with different toolkit combinations.

Step 4: Generate trigger eval queries

Read the skill's SKILL.md description carefully. Then read all competing skill descriptions from the selected toolkits:

uv run python tools/list_skill_descriptions.py workbench/<toolkit1> workbench/<toolkit2> ...

Use the competing descriptions to understand clash surfaces — which skills have overlapping vocabulary or intent.

Generate 20 eval queries — a mix of should-trigger (10) and should-not-trigger (10).

Query quality rules

Queries must be realistic — what a real user would actually type. Include personal context, specific details, file paths, API names, error messages, casual phrasing. Mix formal and informal, long and short.

Bad: "Format this data", "Build a pipeline", "Deploy something"

Good: "ok so my boss just sent me this xlsx file (its in my downloads, called something like 'Q4 sales final FINAL v2.xlsx') and she wants me to add a column that shows the profit margin as a percentage. The revenue is in column C and costs are in column D i think"

Should-trigger queries (10)

Think about coverage — different phrasings of the same intent:

Some formal, some casual
Cases where the user doesn't name the skill explicitly but clearly needs it
Uncommon use cases at the edges of the skill's scope
Cases where this skill competes with another but should win

Should-not-trigger queries (10)

The most valuable negatives are near-misses — queries that share keywords or concepts with the skill but actually need something different:

Adjacent domains or overlapping vocabulary
Ambiguous phrasing where a keyword match would trigger but shouldn't
Queries that touch on the skill's domain but in a context where another tool is better
Specific in-progress tasks that belong to sibling skills

Avoid obviously irrelevant negatives — "write a fibonacci function" as a negative for a pipeline skill doesn't test anything. The negatives should be genuinely tricky.

Disabled queries

If during analysis a query turns out to be an undertrigger (Claude handles it directly without any skill), mark it as disabled instead of removing:

{"query": "...", "should_trigger": true, "disabled": true, "reason": "undertrigger — Claude uses MCP directly"}

Write trigger-eval.json.

Step 5: Review with user

Present the generated queries grouped by should-trigger/should-not-trigger. Explain the reasoning for tricky cases. Let the user edit, add, or remove queries before finalizing.

Step 6: Build workspaces

Run:

uv run python tools/create_eval_workspace.py evals/<toolkit>/<skill>

This creates all workspaces defined in config.json.

Step 7: Continue to run-eval

Ask the user if they want to run the eval now. If yes, hand over to /run-eval <toolkit> <skill>. Do not duplicate the eval running and analysis logic here.

related-skills.json

même dépôt

validate-toolkits.md

from "dlt-hub/dlthub-ai-workbench"

Validate toolkit components and project docs — check external doc URLs, cross-references between skills/commands/rules, and verify README.md and CLAUDE.md are in sync with actual toolkit state. Use when the user asks to validate, review, or check toolkit quality.

2026-05-2241

toolkit-dispatch.md

from "dlt-hub/dlthub-ai-workbench"

Helps users figure out what they can build with dlt and which workflow to start. MUST use this skill when the user asks questions like 'what can you do', 'how do I build a pipeline', 'how do I make reports', 'how do I deploy', 'what are toolkits', 'what's available', 'I'm new to dlt', 'where do I start', or seems confused about what to do next after initial setup. Also use when the user asks broad capability questions about data engineering with dlt. Do NOT use when the user has a specific task in progress like debugging a pipeline, validating data, or adding endpoints. Do NOT use when the user explicitly wants a guided end-to-end demo — use **quick-start** for that.

2026-05-2241

quick-start.md

from "dlt-hub/dlthub-ai-workbench"

Use when the user wants a guided end-to-end run from data to dashboard in a few prompts: 'show me a demo', 'give me a quick start', 'take me through the full workflow', 'how do I go from data to dashboard', 'walk me through ingestion to visualization', 'I want to try everything end-to-end'. Do NOT use when the user is asking what's available or where to start in general — use the `toolkit-dispatch` skill (in init) for capability-discovery questions ('what can you do', 'what toolkits are there', 'I'm new to dlt'). Do NOT use when the user already has a specific task underway (debugging, adding an endpoint, deploying).

2026-05-2241

adjust-endpoint.md

from "dlt-hub/dlthub-ai-workbench"

Adjust a working dlt pipeline for production — remove dev limits, verify pagination, configure incremental loading, expand date ranges. Use when the user wants to remove .add_limit(), load more data, fix pagination, or set up incremental loading.

2026-05-2241

create-rest-api-pipeline.md

from "dlt-hub/dlthub-ai-workbench"

Create a dlt REST API pipeline. Use for the rest_api core source, or any generic REST/HTTP API source. Not for sql_database or filesystem sources.

2026-05-2241

debug-pipeline.md

from "dlt-hub/dlthub-ai-workbench"

Debug and inspect a dlt pipeline after running it. Use after a pipeline run (success or failure) to inspect traces, load packages, schema, data, and diagnose errors like missing credentials or failed jobs.

2026-05-2241

package.json

"author": "dlt-hub"

"repository": "dlt-hub/dlthub-ai-workbench"

Ouvrir le dépôt GitHub Voir les dépôts du créateur

$ install --global

$ download --local

Exécuter dans Manus

$ useful --forSOC

Analystes en assurance qualité des logiciels et testeursProfessions informatiques et mathématiques15-1253L4

name	create-eval
description	Create trigger evaluation setup for a toolkit skill. Use when the user wants to test whether a skill's description triggers correctly, set up eval workspaces, or generate trigger test queries for a skill. Use when user says 'create eval', 'test triggers', 'eval skill', or wants to measure skill triggering accuracy.
argument-hint	[toolkit] [skill]

Create trigger eval for a skill

Scaffold a trigger eval for $ARGUMENTS (format: toolkit skill or toolkit/skill).

Step 1: Locate the skill

Step 2: Create eval directory

Create evals/<toolkit>/<skill>/ if it doesn't exist.

Step 3: Determine eval workspaces

Ask the user which workspace configurations to test. Each workspace represents a different set of installed toolkits — this tests how the skill behaves when competing with other skills.

Common patterns:

init-only — just dlthub ai init (minimum skills: setup-secrets, toolkit-dispatch). Tests cold-start triggering.
with-<toolkit> — init + the skill's own toolkit installed. Tests triggering with competing sibling skills.

Write config.json:

{
  ".eval-workspaces": {
    "init-only": {"toolkits": []},
    "with-rest-api": {"toolkits": ["rest-api-pipeline"]}
  }
}

Ask the user if they want additional workspace configurations. Each entry adds a workspace with different toolkit combinations.

Step 4: Generate trigger eval queries

Read the skill's SKILL.md description carefully. Then read all competing skill descriptions from the selected toolkits:

uv run python tools/list_skill_descriptions.py workbench/<toolkit1> workbench/<toolkit2> ...

Use the competing descriptions to understand clash surfaces — which skills have overlapping vocabulary or intent.

Generate 20 eval queries — a mix of should-trigger (10) and should-not-trigger (10).

Query quality rules

Bad: "Format this data", "Build a pipeline", "Deploy something"

Should-trigger queries (10)

Think about coverage — different phrasings of the same intent:

Some formal, some casual
Cases where the user doesn't name the skill explicitly but clearly needs it
Uncommon use cases at the edges of the skill's scope
Cases where this skill competes with another but should win

Should-not-trigger queries (10)

The most valuable negatives are near-misses — queries that share keywords or concepts with the skill but actually need something different:

Adjacent domains or overlapping vocabulary
Ambiguous phrasing where a keyword match would trigger but shouldn't
Queries that touch on the skill's domain but in a context where another tool is better
Specific in-progress tasks that belong to sibling skills

Avoid obviously irrelevant negatives — "write a fibonacci function" as a negative for a pipeline skill doesn't test anything. The negatives should be genuinely tricky.

Disabled queries

If during analysis a query turns out to be an undertrigger (Claude handles it directly without any skill), mark it as disabled instead of removing:

{"query": "...", "should_trigger": true, "disabled": true, "reason": "undertrigger — Claude uses MCP directly"}

Write trigger-eval.json.

Step 5: Review with user

Present the generated queries grouped by should-trigger/should-not-trigger. Explain the reasoning for tricky cases. Let the user edit, add, or remove queries before finalizing.

Step 6: Build workspaces

Run:

uv run python tools/create_eval_workspace.py evals/<toolkit>/<skill>

This creates all workspaces defined in config.json.

Step 7: Continue to run-eval

Ask the user if they want to run the eval now. If yes, hand over to /run-eval <toolkit> <skill>. Do not duplicate the eval running and analysis logic here.

create-eval

Create trigger eval for a skill

Step 1: Locate the skill

Step 2: Create eval directory

Step 3: Determine eval workspaces

Step 4: Generate trigger eval queries

Query quality rules

Should-trigger queries (10)

Should-not-trigger queries (10)

Disabled queries

Step 5: Review with user

Step 6: Build workspaces

Step 7: Continue to run-eval

Plus depuis ce dépôt

Plus depuis ce dépôt

Create trigger eval for a skill

Step 1: Locate the skill

Step 2: Create eval directory

Step 3: Determine eval workspaces

Step 4: Generate trigger eval queries

Query quality rules

Should-trigger queries (10)

Should-not-trigger queries (10)

Disabled queries

Step 5: Review with user

Step 6: Build workspaces

Step 7: Continue to run-eval