一键导入
worker-benchmarks
Run comprehensive worker system benchmarks and performance analysis
菜单
Run comprehensive worker system benchmarks and performance analysis
Spawn nested sub-agents (agents that spawn sub-agents, up to depth=5) via Claude Code's native Task tool — for context-managed deep delegation
Author a workflow — either an MCP workflow template (persisted, lifecycle) or a native .claude/workflows/*.js orchestration script (agent/parallel/pipeline fan-out)
Run a workflow — drive an MCP workflow lifecycle (execute/pause/resume/cancel) or invoke + resume a native .claude/workflows/*.js orchestration via the Workflow tool
Side-by-side comparison of ruflo vs HAL vs other GAIA harnesses — capability gaps, design decisions, and improvement roadmap
Diagnose why a GAIA question failed — extract trace, classify failure mode, and propose a fix
Walk through a complete GAIA benchmark→submit flow — from key resolution through HAL-compatible package generation
| name | worker-benchmarks |
| description | Run comprehensive worker system benchmarks and performance analysis |
| user-invocable | true |
Run comprehensive performance benchmarks for the agentic-flow worker system.
# Run full benchmark suite
npx agentic-flow workers benchmark
# Run specific benchmark
npx agentic-flow workers benchmark --type trigger-detection
npx agentic-flow workers benchmark --type registry
npx agentic-flow workers benchmark --type agent-selection
npx agentic-flow workers benchmark --type concurrent
trigger-detection)Tests keyword detection speed across 12 worker triggers.
registry)Tests CRUD operations on worker entries.
agent-selection)Tests performance-based agent selection.
cache)Tests model caching performance.
concurrent)Tests parallel worker creation and updates.
memory-keys)Tests memory pattern key generation.
═══════════════════════════════════════════════════════════
📈 BENCHMARK RESULTS
═══════════════════════════════════════════════════════════
✅ Trigger Detection
Operation: detect
Count: 1,000
Avg: 0.045ms | p95: 0.120ms (target: 5ms)
Throughput: 22,222 ops/s
Memory Δ: 0.12MB
✅ Worker Registry
Operation: crud
Count: 1,500
Avg: 1.234ms | p95: 3.456ms (target: 10ms)
Throughput: 810 ops/s
Memory Δ: 2.34MB
───────────────────────────────────────────────────────────
📊 SUMMARY
───────────────────────────────────────────────────────────
Total Tests: 6
Passed: 6 | Failed: 0
Avg Latency: 0.567ms
Total Duration: 2345ms
Peak Memory: 8.90MB
═══════════════════════════════════════════════════════════
Benchmark thresholds are configured in .claude/settings.json:
{
"performance": {
"benchmarkThresholds": {
"triggerDetection": { "p95Ms": 5 },
"workerRegistry": { "p95Ms": 10 },
"agentSelection": { "p95Ms": 1 },
"memoryKeyGeneration": { "p95Ms": 0.1 },
"concurrentWorkers": { "totalMs": 1000 }
}
}
}
import { workerBenchmarks, runBenchmarks } from 'agentic-flow/workers/worker-benchmarks';
// Run full suite
const suite = await runBenchmarks();
console.log(suite.summary);
// Run individual benchmarks
const triggerResult = await workerBenchmarks.benchmarkTriggerDetection(1000);
const registryResult = await workerBenchmarks.benchmarkRegistryOperations(500);
CLAUDE_FLOW_MODEL_CACHE_MB=512CLAUDE_FLOW_WORKER_PARALLEL=trueCLAUDE_FLOW_SUPPRESS_WARNINGS=true