Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

$pwd:

agent-harness-engineering

Name: Agent Harness Engineering
Author: s-kostyaev

// Build, audit, or improve agentic coding/research systems and their harnesses. Use when designing agent workflows, small-model scaffolds, tool schemas, progressive-disclosure context, repository knowledge maps, guardrails, validation loops, subagents, evals, recovery mechanisms, or autonomous-agent safety policies.

Exécuter dans Manus

$ git log --oneline --stat

stars:7

forks:1

updated:25 mai 2026 à 10:01

Explorateur de fichiers

6 fichiers

SKILL.md

readonly

name	agent-harness-engineering
description	Build, audit, or improve agentic coding/research systems and their harnesses. Use when designing agent workflows, small-model scaffolds, tool schemas, progressive-disclosure context, repository knowledge maps, guardrails, validation loops, subagents, evals, recovery mechanisms, or autonomous-agent safety policies.

Agent Harness Engineering

Use this skill to design the environment around an agent: context surfaces, tools, guardrails, validation, recovery, and feedback loops. The default move is not "write a bigger prompt"; it is "make the right action legible and mechanically checkable."

Workflow

Define the job and risk.
- Goal: coding, research, operations, UI validation, data work, review, or cleanup.
- Model class: small/local, frontier, reasoning, tool-weak, or no-tool.
- Autonomy: advisory, edit-with-confirmation, autonomous in sandbox, or PR producer.
- Failure cost: low, medium, high, irreversible, or data-exfiltration risk.
Build a context map before loading details.
- Keep the always-loaded entry point short: a table of contents, not a manual.
- Point to repo-local sources of truth: README.org, docs/, AGENTS.md, skills, blueprints, eval fixtures, generated schemas, and runbooks.
- Preserve an existing canonical source of truth. Do not create mirrored docs unless there is an enforcement/update path.
- Load details only when the current step needs them.
Shape the tool surface.
- Expose the fewest tools that can complete the current step.
- Prefer deterministic tools and structured outputs.
- Use compound tools when a small model repeatedly needs the same chain (search_and_read, read_and_edit, write_and_test).
- Add repair hints to tool failures: why it failed, what to try next, and an exact minimal example.
Add mechanical feedback loops.
- Run tests, linters, type checks, UI probes, log queries, or metrics checks as tools or hooks.
- Treat failures as prompt surface: concise, actionable, and scoped.
- Encode recurring review comments as docs, lints, checks, or skills.
Add safety and state controls.
- Use read-before-write, sandboxed filesystem policy, DLP, irreversible action checks, and confirmation only for high-signal cases.
- Keep session-local state for plans, files read, cached pure tool results, evidence, and model/tool failures.
- For multi-file edits, prefer checkpoints or rollbackable patches.
Evaluate and iterate.
- Add a small eval before claiming a harness improvement works.
- Compare profiles such as baseline, limited-tools, compound-tools, strict-write, selective-skills, and quality-monitor.
- Track pass rate, tool-call count, recovery count, validation failures, and context size.

Small-Model Biases

For small or local models, optimize for fewer choices and shorter turns:

Inject one relevant instruction block, not all instructions.
Show fewer tool schemas; route by task phase when possible.
Keep the active plan visible and current.
Prefer exact file ranges and diffs over full files.
Cache repeated read/search results.
Detect loops: repeated tool call, repeated edit miss, empty answer, malformed tool args, or repeated shell failure.
On repair turns, give the model the failing command, the smallest relevant output tail, and the next action.

Repository Harness Surfaces

Use this minimal checklist when auditing a project:

Entry map: short AGENTS.md or agent index with links to deeper docs.
Knowledge: one canonical source or well-indexed docs; large Org sources should have a structure-first query tool such as oq.
Tools: read/search/edit/shell split by role; MCP tools classified by risk.
Validation: fast test command, lint/type command, formatting command.
Guardrails: sandbox, secret handling, read-before-write, destructive checks.
Feedback: evals, traces, review agents, quality sweeps, stale-doc cleanup.
Legibility: logs, UI/browser state, screenshots, metrics, traces, schemas.

Run the bundled audit for a quick first pass:

python3 <skill-dir>/scripts/harness_audit.py /path/to/repo

Progressive References

Read only the reference needed for the current task:

references/principles.md: principles and tradeoffs for agent harness design.
references/patterns.md: reusable implementation patterns and when to use them.
references/examples.md: concrete examples for coding agents, research agents, UI agents, and small-model profiles.

Output Shape

When designing or auditing a harness, return:

Goal:
Model and autonomy assumptions:
Current harness inventory:
Recommended changes:
1. Highest-leverage change
2. Next change
3. Later change
Validation plan:
Risks and guardrails:

Keep recommendations mechanical. Prefer "add this check/tool/state machine" over "tell the model to be careful."

related-skills.json

même dépôt

code-review.md

from "s-kostyaev/.emacs.d"

Review code changes with a focused multi-agent workflow. Use when auditing a diff, branch, pull request, patch, commit range, or local worktree for bugs, regressions, missing tests, security issues, API breakage, maintainability risks, or review readiness.

2026-05-267

web-browse-context.md

from "s-kostyaev/.emacs.d"

Efficiently capture web pages as clean markdown and query them with `mq` to avoid loading full content into context. Use when browsing web pages, extracting specific sections, or summarizing sources with minimal context usage.

2026-05-227

deep-research.md

from "s-kostyaev/.emacs.d"

Use this skill for academic-level research via a multi-agent swarm.

2026-04-297

textweb.md

from "s-kostyaev/.emacs.d"

Use this skill to browse the web textually.

2026-02-217

mq.md

from "s-kostyaev/.emacs.d"

Query markdown files efficiently with mq CLI. Use when exploring documentation structure, extracting specific sections, or reducing token usage when reading .md files.

2026-02-117

ddgr-web-search.md

from "s-kostyaev/.emacs.d"

Run DuckDuckGo web searches from the terminal using `ddgr` with JSON output. Use when the task calls for lightweight web search, quick result triage, or filtering by region/time/site.

2026-02-117

package.json

"author": "s-kostyaev"

"repository": "s-kostyaev/.emacs.d"

Ouvrir le dépôt GitHub Voir les dépôts du créateur

$ install --global

$ download --local

Exécuter dans Manus

$ useful --forSOC

Développeurs de logicielsProfessions informatiques et mathématiques15-1252L4

name	agent-harness-engineering
description	Build, audit, or improve agentic coding/research systems and their harnesses. Use when designing agent workflows, small-model scaffolds, tool schemas, progressive-disclosure context, repository knowledge maps, guardrails, validation loops, subagents, evals, recovery mechanisms, or autonomous-agent safety policies.

Agent Harness Engineering

Workflow

Define the job and risk.
- Goal: coding, research, operations, UI validation, data work, review, or cleanup.
- Model class: small/local, frontier, reasoning, tool-weak, or no-tool.
- Autonomy: advisory, edit-with-confirmation, autonomous in sandbox, or PR producer.
- Failure cost: low, medium, high, irreversible, or data-exfiltration risk.
Build a context map before loading details.
- Keep the always-loaded entry point short: a table of contents, not a manual.
- Point to repo-local sources of truth: README.org, docs/, AGENTS.md, skills, blueprints, eval fixtures, generated schemas, and runbooks.
- Preserve an existing canonical source of truth. Do not create mirrored docs unless there is an enforcement/update path.
- Load details only when the current step needs them.
Shape the tool surface.
- Expose the fewest tools that can complete the current step.
- Prefer deterministic tools and structured outputs.
- Use compound tools when a small model repeatedly needs the same chain (search_and_read, read_and_edit, write_and_test).
- Add repair hints to tool failures: why it failed, what to try next, and an exact minimal example.
Add mechanical feedback loops.
- Run tests, linters, type checks, UI probes, log queries, or metrics checks as tools or hooks.
- Treat failures as prompt surface: concise, actionable, and scoped.
- Encode recurring review comments as docs, lints, checks, or skills.
Add safety and state controls.
- Use read-before-write, sandboxed filesystem policy, DLP, irreversible action checks, and confirmation only for high-signal cases.
- Keep session-local state for plans, files read, cached pure tool results, evidence, and model/tool failures.
- For multi-file edits, prefer checkpoints or rollbackable patches.
Evaluate and iterate.
- Add a small eval before claiming a harness improvement works.
- Compare profiles such as baseline, limited-tools, compound-tools, strict-write, selective-skills, and quality-monitor.
- Track pass rate, tool-call count, recovery count, validation failures, and context size.

Small-Model Biases

For small or local models, optimize for fewer choices and shorter turns:

Inject one relevant instruction block, not all instructions.
Show fewer tool schemas; route by task phase when possible.
Keep the active plan visible and current.
Prefer exact file ranges and diffs over full files.
Cache repeated read/search results.
Detect loops: repeated tool call, repeated edit miss, empty answer, malformed tool args, or repeated shell failure.
On repair turns, give the model the failing command, the smallest relevant output tail, and the next action.

Repository Harness Surfaces

Use this minimal checklist when auditing a project:

Entry map: short AGENTS.md or agent index with links to deeper docs.
Knowledge: one canonical source or well-indexed docs; large Org sources should have a structure-first query tool such as oq.
Tools: read/search/edit/shell split by role; MCP tools classified by risk.
Validation: fast test command, lint/type command, formatting command.
Guardrails: sandbox, secret handling, read-before-write, destructive checks.
Feedback: evals, traces, review agents, quality sweeps, stale-doc cleanup.
Legibility: logs, UI/browser state, screenshots, metrics, traces, schemas.

Run the bundled audit for a quick first pass:

python3 <skill-dir>/scripts/harness_audit.py /path/to/repo

Progressive References

Read only the reference needed for the current task:

references/principles.md: principles and tradeoffs for agent harness design.
references/patterns.md: reusable implementation patterns and when to use them.
references/examples.md: concrete examples for coding agents, research agents, UI agents, and small-model profiles.

Output Shape

When designing or auditing a harness, return:

Goal:
Model and autonomy assumptions:
Current harness inventory:
Recommended changes:
1. Highest-leverage change
2. Next change
3. Later change
Validation plan:
Risks and guardrails:

Keep recommendations mechanical. Prefer "add this check/tool/state machine" over "tell the model to be careful."

agent-harness-engineering

Agent Harness Engineering

Workflow

Small-Model Biases

Repository Harness Surfaces

Progressive References

Output Shape

Plus depuis ce dépôt

Plus depuis ce dépôt

Agent Harness Engineering

Workflow

Small-Model Biases

Repository Harness Surfaces

Progressive References

Output Shape