Run any Skill in Manus with one click

improve

Stars9

Forks3

UpdatedJune 9, 2026 at 11:51

Use when producing agent/LLM evals, synthetic simulation data, or self-improvement pipelines for prompts, code, skills, agents, harnesses, and workflows. Covers AgentEvals/AgentV, Agent Skills evals, ASSERT, GEPA, Trace, VISTA, Agent Lightning, SkillOpt, Simula-style data design, progressive disclosure, deterministic workspaces, and release evidence. USE FOR: eval creation, EVAL.yaml, AgentEvals, AgentV, evals.json, ASSERT, judge-traces, behavior taxonomy, judges, graders, rubrics, synthetic data, simulation data, Simula, QDC, source-grounded generation, prompt optimization, agent improvement, skill improvement, harness hardening, progressive disclosure, deterministic workflows, GEPA, Trace, VISTA, Agent Lightning, SkillOpt DO NOT USE FOR: ordinary unit/integration tests without AI quality criteria (use testing), refactoring without eval or trace feedback (use refactor), generic Agent Skills packaging without eval or improvement work (use agent-skills)

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

Tyler-R-Kendrick

Tyler-R-Kendrick/agent-skills

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Software Quality Assurance Analysts and TestersComputer and Mathematical Occupations·SOC 15-1253

File Explorer

45 files

SKILL.md

readonly

More from this repository

same repository

Tyler-R-Kendrick/agent-skills

Use when working with AI agent protocols, standards, interoperability specifications, evaluation contracts, synthetic simulation data, improvement pipelines, and agent steering workflows. Covers MCP, A2A, ACP, Agent Skills, AGENTS.md, ADL, Improve, x402, AP2, MCP Apps, cagent, and learn. USE FOR: agent protocol selection, comparing MCP vs A2A vs ACP, understanding agent standards ecosystem, choosing payment protocols, choosing eval standards, choosing improvement techniques, choosing synthetic data simulation techniques, steering from user feedback DO NOT USE FOR: specific protocol, eval, or improvement implementation details (use the sub-skills: mcp, a2a, acp, improve, learn, x402, etc.)

2026-06-099

learn

Tyler-R-Kendrick/agent-skills

Use when a user corrects, rejects, edits, or redirects an LLM/agent response and the correction should become a reusable reasoning strategy. Converts feedback into generalized learnings for ~/.agents/STEERING.md with linked RDF/Turtle evidence. USE FOR: user corrections, preference feedback, rejected agent behavior, reasoning strategy updates, steering file maintenance DO NOT USE FOR: storing task facts (use memory), ordinary skill authoring (use agent-skills), project instruction files unrelated to feedback (use agents-md)

2026-06-089

a2a

Tyler-R-Kendrick/agent-skills

Use when implementing the Agent-to-Agent (A2A) protocol for inter-agent communication, task delegation, and multi-agent collaboration. USE FOR: agent-to-agent communication, task delegation between agents, Agent Card publishing, multi-agent collaboration DO NOT USE FOR: tool integration (use mcp), agent payments (use ap2 or x402), agent definition (use adl)

2026-02-119

acp

Tyler-R-Kendrick/agent-skills

Use when implementing the Agent Communication Protocol (ACP) for REST-based agent-to-agent communication, task delegation, and multimodal message exchange. USE FOR: ACP agent servers, ACP client integration, agent discovery via manifests, run lifecycle management, session-based stateful workflows, BeeAI agents DO NOT USE FOR: JSON-RPC agent communication (use a2a), tool integration for LLMs (use mcp), agent payments (use ap2 or x402), agent definition (use adl)

2026-02-119

adl

Tyler-R-Kendrick/agent-skills

Use when defining AI agents declaratively with Agent Definition Language (ADL). Covers agent identity, LLM configuration, tools, permissions, RAG inputs, and governance metadata. USE FOR: declarative agent blueprints, agent identity and permissions, LLM configuration, governance metadata DO NOT USE FOR: agent runtime orchestration (use cagent), tool integration (use mcp), agent communication (use a2a)

2026-02-119

agent-skills

Tyler-R-Kendrick/agent-skills

Use when creating, packaging, or distributing Agent Skills. Covers the SKILL.md specification, frontmatter schema, naming conventions, marketplace publishing, and the skills-ref validator. USE FOR: creating SKILL.md files, packaging reusable agent capabilities, marketplace publishing, frontmatter schema validation DO NOT USE FOR: project-level agent guidance (use agents-md), agent runtime configuration (use adl or cagent)

2026-02-119

name	improve
description	Use when producing agent/LLM evals, synthetic simulation data, or self-improvement pipelines for prompts, code, skills, agents, harnesses, and workflows. Covers AgentEvals/AgentV, Agent Skills evals, ASSERT, GEPA, Trace, VISTA, Agent Lightning, SkillOpt, Simula-style data design, progressive disclosure, deterministic workspaces, and release evidence. USE FOR: eval creation, EVAL.yaml, AgentEvals, AgentV, evals.json, ASSERT, judge-traces, behavior taxonomy, judges, graders, rubrics, synthetic data, simulation data, Simula, QDC, source-grounded generation, prompt optimization, agent improvement, skill improvement, harness hardening, progressive disclosure, deterministic workflows, GEPA, Trace, VISTA, Agent Lightning, SkillOpt DO NOT USE FOR: ordinary unit/integration tests without AI quality criteria (use testing), refactoring without eval or trace feedback (use refactor), generic Agent Skills packaging without eval or improvement work (use agent-skills)
license	MIT
metadata	{"displayName":"Self-Improvement Pipelines","author":"Tyler-R-Kendrick","version":"1.0.0","tags":["ai","evals","improvement","optimization","progressive-disclosure","deterministic-workflows","agentevals","agentv","assert","gepa","trace","vista","agent-lightning","skillopt","simula","synthetic-data","rl"]}
compatibility	claude, copilot, cursor

Evals and Self-Improvement Pipelines

Operating Rule

Default new agent and LLM evals to AgentEvals EVAL.yaml with AgentV. Improve only against evidence: eval failures, trace observations, benchmark deltas, human review notes, or explicit user goals. Keep each loop narrow, reproducible, and auditable.

Progressive Disclosure

Load only the reference needed for the requested eval or improvement surface:

If the task says...	Then read...
"install", "setup", "environment", "venv", "dependencies", "API keys", "Node", "Python", or missing native tools	`references/environment-setup.md`
"install AgentV", "install ASSERT", "setup eval tools", "eval runner install", or native eval validation setup	`references/install-eval-tools.md`
"install GEPA", "install Trace", "install Agent Lightning", "install SkillOpt", "setup optimizer", or improvement library dependencies	`references/install-improvement-libs.md`
"create an eval", "judge", "grader", "rubric", "EVAL.yaml", or no eval standard	`references/agentevals.md` and `references/agentv.md`
"which eval standard", "convert eval", "compare standards", or mixed eval formats	`references/eval-standards-guide.md`
"Agent Skills eval", `evals.json`, "skill quality", "with_skill", or "without_skill"	`references/agent-skills-evals.md`
"ASSERT", `assert-ai`, "judge-traces", "spec-driven", "behavior taxonomy", "trace-aware", "policy failure modes", or `eval_config.yaml`	`references/assert.md`
"eval starter", "eval lint", "eval workspace contract", or expected eval artifacts	`references/eval-workspace-contracts.md`
"optimize a skill", "progressive disclosure", "Table of Contents", "Index Page", "conditional access", "top-level links", "scripted workflow", or "deterministic workflow generation"	`references/skill-optimization-strategy.md`
"which technique", "optimize this", "improvement plan", or mixed artifacts	`references/techniques-guide.md`
"GEPA", "Pareto", "reflective mutation", "prompt evolution", or "optimize anything"	`references/gepa.md`
"Trace", "OptoPrime", "computation graph", "node", "bundle", or end-to-end generative optimization	`references/microsoft-trace.md`
"VISTA", "interpretable APO", "hypothesis agent", "random restart", or "epsilon-greedy"	`references/vista.md`
"Agent Lightning", "RL", "reward", "policy reward", "governed training", or skill improvement with policy constraints	`references/agent-lightning.md`
"SkillOpt", "SkillOpts", "skill evolution", `best_skill.md`, "held-out gate", "bounded edits", "textual learning rate", or "SkillOpt-Sleep"	`references/skillopt.md`
"eval failures", "agent traces", "span logs", "benchmark deltas", or "release evidence"	`references/eval-trace-improvement.md`
"synthetic data", "simulation data", "Simula", "QDC", "Source2Synth", "MAG-V", "MetaSynth", "BARE", "Condor", "data auditor", "generate data", or "simulate"	`references/simulation-data.md`
"CLI", "init", "improve", "eval", "simulate", "lint", "workspace", or "deterministic improvement artifacts"	`references/workspace-contracts.md`

Workflow

Identify whether the user needs an eval artifact, an improvement loop, or both.
For eval artifacts, select the standard from explicit language, existing repo artifacts, or the default AgentEvals rule.
For improvement loops, identify the artifact type and evidence: eval cases, traces, logs, cost/latency metrics, human review, or explicit constraints.
Load only the matching reference docs, then use scripts/improve-cli.ts init, improve, eval, simulate, or lint when deterministic artifacts help.
Prefer deterministic graders and structural checks before subjective LLM review.
Run the smallest useful loop, compare against the baseline, and preserve selected candidates plus rejected hypotheses.
Report the evidence delta and any residual risk before claiming the artifact is evaluated or improved.

Script

Use the bundled TypeScript CLI for deterministic planning, eval artifact generation, technique-specific local implementations, simulation data generation, improvement workspaces, and structural linting:

node skills/ai/improve/scripts/improve-cli.ts --help
node skills/ai/improve/scripts/improve-cli.ts init improve/support-skill --json
node skills/ai/improve/scripts/improve-cli.ts improve . --gepa --json
node skills/ai/improve/scripts/improve-cli.ts eval --agent-skills --json
node skills/ai/improve/scripts/improve-cli.ts simulate . --simula --json
node skills/ai/improve/scripts/improve-cli.ts lint improve/support-skill --json

For the CLI contract and generated workspace structure, read references/workspace-contracts.md. The script is dependency-free, calls the bundled implementation libraries in scripts/, and expects Node 24+ TypeScript type stripping.

Best Practices

Use AgentEvals by default: default new agent and LLM evals to AgentEvals EVAL.yaml with AgentV unless the user or repo clearly specifies another standard.
Use progressive disclosure by default: treat SKILL.md as a table-of-contents/index page with conditional top-level links; put deeper links inside references.
Codify workflows in scripts: make order-of-operations and workspace generation deterministic in scripts; let calling agents provide generated inputs and handle inference operations.
Improve from evidence: require eval failures, trace observations, benchmark deltas, or explicit human feedback before changing an artifact.
Keep loops narrow: optimize one prompt, skill behavior, agent step, code path, or workflow contract at a time.
Preserve baselines: save the original artifact, eval cases, trace inputs, and metrics before generating candidates.
Wire explicit assertions: wire explicit test cases and assertions; do not ship one anonymous catch-all judge.
Prefer deterministic checks: use exact graders, structural checks, schema checks, and replayable traces before subjective LLM review.
Use GEPA for text evolution: use reflective mutation and Pareto selection when the artifact is textual and measurable.
Use Trace for trainable workflows: use computation-graph optimization when code, prompts, and agent steps need end-to-end feedback propagation.
Use VISTA for interpretability: decouple hypotheses from rewrites when the improvement loop needs auditable reasoning and local-optimum escape.
Design synthetic data before sampling: use dataset-level taxonomies, local diversity, complexity schedules, quality gates, and lineage before asking an agent or model to generate simulation records.
Validate the candidate: accept a candidate only after it beats the baseline on held-out evals or trace-backed acceptance criteria.
Validate natively first: validate eval artifacts with the native standard tool where possible, then use the bundled linter as a structural fallback.
Record rejected paths: keep failed hypotheses and candidates so future iterations do not rediscover the same dead ends.