mit einem Klick
instruction-tuning
// Use sub-agents as test subjects to iteratively improve .claude/ instruction files. Run agent → inspect → fix instructions → re-run until agents follow the protocol correctly without hints.
// Use sub-agents as test subjects to iteratively improve .claude/ instruction files. Run agent → inspect → fix instructions → re-run until agents follow the protocol correctly without hints.
Discover any website's API and create domain plugins with proxy routes. Use when the user wants to create an API for a website, discover a web service's data transport, add a new domain, capture browser traffic, build typed API clients, or integrate with a third-party site. Also use when the user mentions a website name and wants to interact with it programmatically.
Build a complete application from a short description. Asks the developer clarifying questions, generates data requirements, launches discovery agents, and builds a dashboard. Use when the developer describes what they want to build in plain language — "compare tickets", "track prices", "search jobs across sites".
Deploy to EC2 production server. Use when deploying new code, rebuilding Docker images, wiping and reseeding the database, checking container health, or managing the EC2 instance via AWS CLI.
Run local CI checks and verify GitHub Actions status. Use before committing, after pushing, or when asked to check/fix CI.
Build Next.js dashboard pages that consume domain proxy APIs. Use when the user wants to create a dashboard, build a UI page, add a search interface, display data from captured APIs, create comparison views, or build any frontend that calls /api/<domain>/ endpoints.
Iterative debugging with targeted logs. Use when browser connections fail, traffic capture returns empty, proxy routes return errors, WebSocket issues, or any problem that can't be solved on the first attempt. Add logs, read output, narrow the search, repeat until fixed, then clean up.
| name | instruction-tuning |
| description | Use sub-agents as test subjects to iteratively improve .claude/ instruction files. Run agent → inspect → fix instructions → re-run until agents follow the protocol correctly without hints. |
DO NOT write memory files. All learnings go into
.claude/rules/,.claude/agents/, test-server code, or boardshop reference routes — NOT into memory.
Before doing ANYTHING, check if .claude/user-consent.md exists and contains ACCEPTED: true.
If the file exists and has ACCEPTED: true, display:
✅ Prior consent on file (DATE). Proceeding. Review .claude/user-consent.md to revoke.
Then skip to "Before Starting — Ask the User".
If the file does NOT exist or does not contain ACCEPTED: true, you MUST present these 3 warnings and get explicit "yes" to each before proceeding:
╔══════════════════════════════════════════════════════════════╗
║ 💣💣💣 WARNING 1: TERMS OF SERVICE 💣💣💣 ║
║ ║
║ This tool intercepts website APIs via browser automation. ║
║ This MAY VIOLATE the Terms of Service of target websites. ║
║ Unauthorized scraping can result in IP bans, legal action, ║
║ or account termination. ║
║ ║
║ YOU are responsible for ensuring you have legal authority ║
║ to intercept traffic on any website you target. ║
║ ║
║ Do you confirm you have legal authority to use this tool ║
║ on your intended targets? (yes/no) ║
╚══════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════╗
║ ⚠️⚠️⚠️ WARNING 2: AUTONOMOUS AGENTS ⚠️⚠️⚠️ ║
║ ║
║ This skill launches AUTONOMOUS coding agents that: ║
║ • Write and execute arbitrary code in isolated worktrees ║
║ • Connect to real websites via headless browsers ║
║ • Make HTTP requests to external services ║
║ • Modify files, start servers, and spawn processes ║
║ ║
║ Agent behavior is UNPREDICTABLE. They may take unexpected ║
║ actions, navigate to unintended pages, or trigger security ║
║ systems on target websites. ║
║ ║
║ Do you understand and accept the risks of running ║
║ autonomous coding agents? (yes/no) ║
╚══════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════╗
║ 🔥🔥🔥 WARNING 3: RESOURCE CONSUMPTION 🔥🔥🔥 ║
║ ║
║ This skill BURNS THROUGH API TOKENS. Each agent uses ║
║ 50K-170K tokens. 8 parallel agents = 400K-1.3M tokens. ║
║ ║
║ ZOMBIE PROCESSES: Sub-agents can become detached and run ║
║ indefinitely. Chrome/Chromium instances can be orphaned. ║
║ Node.js and Python processes may leak. ║
║ ║
║ Run `bash .claude/hooks/cleanup-agents.sh` to kill zombies. ║
║ Run `pkill -f chromium` if Chrome instances are orphaned. ║
║ ║
║ Do you understand the token cost and the risk of zombie ║
║ processes? (yes/no) ║
╚══════════════════════════════════════════════════════════════╝
If ALL THREE answers are "yes", write .claude/user-consent.md (gitignored):
# User Consent Record
ACCEPTED: true
DATE: [current date]
WARNINGS_ACKNOWLEDGED: tos, autonomous-agents, resource-consumption
If ANY answer is "no", STOP. Do not proceed. Tell the user what they declined.
You are not writing code. You are writing instructions that make other agents write correct code.
.claude/ files are the only lever. You can't hint, coach, or correct the agent mid-run. The instructions must be clear enough that a fresh agent, reading them for the first time, makes the same choices you would make.The sub-agent's code is throwaway. The instruction improvements are the product. Every iteration should close the gap between "what the agent does" and "what you would do."
Turn 1: Ask how many passes — 1 or 2? Default to 1.
Turn 2: Ask which websites to test. The user picks the sites. Assign ports starting at 3011. Max 8 per batch.
Turn 3: Ask what to do when agents finish. Pick one:
Do NOT launch agents until the user answers all questions.
1. Clean: bash .claude/hooks/cleanup-agents.sh
2. Verify commit: ensure worktrees will branch from the latest committed instructions
3. Launch sub-agents with worktree isolation (run_in_background: true)
4. LIVE MONITOR every 60 seconds (see Live Monitoring below)
5. Stop agents when you have enough data or they're stuck
6. Inspect: did they follow the pipeline? Full elimination table?
7. Diagnose: which instruction was too soft, missing, or contradictory?
8. Fix the instruction (generalized, not site-specific)
9. CONSISTENCY CHECK — before committing, search ALL .claude/ files for contradictions:
- Grep for the concept you changed (pagination, testing, session harvest, etc.)
- Verify every mention says the same thing across rules/, agents/, skills/
- Fix any file that contradicts or uses softer language than your change
10. PRUNE .claude/ — delete redundant lines, reduce noise, keep instructions tight. Shorter files are followed better. If a rule can be said in 1 line instead of 3, use 1 line.
11. PROCESS CLEANUP — kill all zombie processes before and after every iteration:
```bash
bash .claude/hooks/cleanup-agents.sh
```
12. FIX INFRASTRUCTURE — if agents waste calls on browser crashes, server restarts, package linking, or port conflicts, fix the underlying infrastructure not just the instructions.
13. BUILD UTILITIES — if agents repeat the same multi-step operation, create a script or endpoint that does it in one call.
14. Add any new patterns to test-server + boardshop reference routes
15. Commit fixes (ensure worktrees will get them on next iteration)
16. Write handoff doc (.claude/tuning-handoff.md) — gitignored, not committed
17. Start fresh Claude Code session to clear stale context, repeat
When you change an instruction, the same concept appears in multiple files. A fix in discovery.md means nothing if discovery-agent.md or a SKILL.md still uses the old soft language.
Before committing any instruction change, search all .claude/ for the concept:
# Example: if you tightened pagination language
grep -rn "pagina\|totalCount\|hasMore\|complete" .claude/rules/ .claude/agents/ .claude/skills/
Every hit must be consistent with your change. Files to check:
.claude/rules/discovery.md — the protocol (agents read this).claude/agents/discovery-agent.md — agent instructions (agents inherit this).claude/skills/instruction-tuning/SKILL.md — prompt template + scorecard (you control this).claude/skills/api-discovery/SKILL.md — discovery skill entry point.claude/skills/api-discovery/reference/*.md — reference files agents may read.claude/skills/app/SKILL.md — app builder that launches discovery agents.claude/CLAUDE.md — top-level project instructionsA single soft "check for" in any file undoes a hard "MUST" in another.
Do NOT wait for agents to complete. Monitor every 60 seconds by reading their output JSONL files:
FILE="/Users/...subagents/agent-XXXX.jsonl"
cat "$FILE" | python3 -c "
import sys, json
for line in sys.stdin:
try:
d = json.loads(line)
if d.get('type') == 'assistant':
for c in d.get('message',{}).get('content',[]):
if isinstance(c,dict):
if c.get('type')=='text' and len(c.get('text',''))>20:
print('TEXT:',c['text'][:400]); print('---')
elif c.get('type')=='tool_use':
n=c.get('name','');inp=c.get('input',{})
if n=='Bash': print(f'BASH: {inp.get(\"command\",\"\")[:140]}')
elif n in('Read','Write','Edit'): print(f'{n}: ...{inp.get(\"file_path\",\"\")[-55:]}')
else: print(f'{n}')
except: pass
" | tail -25
At each check, write a brief analysis:
| Agent | Calls | Phase | Notes |
|---|---|---|---|
| SH | ~35 | GATHER | Found XHR POST, navigating to event page |
| TM | ~40 | GATHER | Clicked "Show More", 0 new traffic — not testing ?page=2 |
For each agent, ask: "What would I do differently right now?"
When to stop early: If an agent is burning calls on the same pattern that failed in prior iterations (e.g., re-clicking "Show More" with 0 new traffic, trying curl /browser/fetch), stop it. You already know the failure — fix the instruction instead of watching it fail again.
Guided agents: When you identify a specific failure, launch a targeted agent with explicit instructions about what to do differently. Compare the guided agent's path to the unguided one — the delta reveals what the instructions are missing.
bash .claude/hooks/cleanup-agents.sh
Kills all agent processes, removes all worktrees, cleans untracked domains, reverts contaminated shared files.
Subagents inherit the parent session's system-reminder context, which may contain OLD file contents from deleted/modified rules files. This causes agents to reference files that no longer exist.
Fix: start a fresh Claude Code session before each iteration batch. Write a handoff doc so the new session can pick up where you left off.
After committing all fixes, write .claude/tuning-handoff.md (gitignored) with:
The new session reads this file to pick up context. Subagents get clean system-reminders because the fresh session reads .claude/ files from disk.
Launch agents with worktree isolation. Each gets a unique port (3010 + N). Concurrency limit: 8 agents max. Beyond 8, browser instances compete for resources and connections drop. If you need more sites, run in batches of 8.
Agent prompt template:
Discover ALL transport types that [site] uses. Build a route for EVERY transport found.
Target: [url]
Follow .claude/rules/discovery.md — GATHER→SCAN→CLASSIFY→BUILD.
In GATHER: connect to HOMEPAGE first and browse naturally (scroll, click) to warm up cookies before navigating to target pages. Intercept pagination traffic. If you see an API endpoint with pagination params (e.g., ?page=1) in traffic, test it directly via page.evaluate("fetch('/api/path?page=2', {credentials:'include'}).then(r=>r.json())..."). For cross-origin APIs, credentials:'include' forwards browser cookies. Use page.evaluate for interaction and fetch() testing — do NOT read __NEXT_DATA__ or DOM data during GATHER. browserFetch is NOT an HTTP endpoint — use page.evaluate("fetch(...)") during discovery.
In CLASSIFY: name the site's core data and verify your transports cover it.
In BUILD: auth-gated endpoints (Gap=Y) go directly to session harvest. Read the session harvest reference file BEFORE writing any harvest code.
In BUILD: after each route, fill the mandatory completeness check. If totalCount > items returned, the route is NOT DONE — paginate before moving to the next route.
Fill ALL 8 elimination rows before writing code.
After building routes, register your domain and test EVERY route through the API server proxy.
Before finishing: run `pnpm biome check --write --unsafe .` and fix any remaining lint or type errors. CI must be clean — you are responsible for leaving the worktree in a buildable state.
Budget: ~150 tool calls. Plan: ~30 GATHER, ~10 SCAN/CLASSIFY, ~80 BUILD, ~30 testing. Do not retry failed requests — unexpected output is information, not failure.
Your port is XXXX.
| Check | Pass/Fail |
|---|---|
| Browser connected to HOMEPAGE first, warmed up before deep navigation | |
| Pagination intercepted in GATHER (new traffic captured OR discovered endpoint confirmed via browser fetch()) | |
| 2a-CHECK: traffic scanned for pagination params, page 2 tested if found | |
| page.evaluate used for interaction + fetch() testing only in GATHER (not for reading NEXT_DATA or DOM data) | |
| Used {credentials:"include"} for cross-origin fetch() calls | |
| Used /browser/mcp/fetch for discovery-phase API testing (not page.evaluate) | |
| Full elimination table (all 8 rows ✓ or ✗) | |
| Route built for EVERY ✓ transport | |
| Used GATHER→SCAN→CLASSIFY→BUILD pipeline | |
| Did NOT search for public APIs | |
| Detail page visited in browser (URL recorded) | |
| Access Gap table produced (Step 2e) | |
| Core data identified in CLASSIFY | |
| Session harvest COMPLETED for Gap=Y endpoints (routes return data, not errors) | |
| Pagination COMPLETE: every route's completeness check shows total == items returned | |
| Routes tested through API server proxy AND return real data (not empty/error) | |
| Wrote files to worktree, not main repo | |
| Stayed under 150 tool calls |
On crash: Check the error message. If it's a fixable infrastructure issue (binary data in traffic, missing dependency, port conflict), apply the code fix and re-launch that single agent. If it's an API-level error (Claude "Could not process image"), check that static resource blocking is working and re-launch.
On timeout (>20 min): Kill the agent. Check if it's stuck in a retry loop (429s, WAF challenges, sleep). If so, fix the instruction that allowed the loop and re-launch.
On success but regression: Compare route count and transport coverage to the previous iteration. If an agent found fewer transports, check:
Flag regressions in the handoff doc. Do not accept fewer transports without explanation.
For each agent, analyze:
Add any new patterns to test-server endpoints + boardshop reference routes before the next iteration.
Every instruction change must work for ANY website. If a fix only helps for a specific site, it's overfitting. Never put specific website names, URLs, or transport classifications in instruction files (test-server, boardshop, and CLAUDE.md are the only places for working code examples).
The loop converges when fresh agents (clean session, no hints):