Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

$pwd:

n-agentic-harnesses

Name: N Agentic Harnesses
Author: NateBJones-Projects

// Design, evaluate, and improve agentic harnesses for developer tools, assistants, workflow runtimes, copilots, and AI-powered products. Use when work involves tool-use architecture, permissions, approval gates, workflow state, durability, context and memory systems, evaluation strategy, observability, operator visibility, or phased implementation plans for an AI system. Trigger when symptoms imply harness gaps too: stale context, surprising tool calls, sessions that die on crash, missing approval controls, or costs spiraling without clear visibility.

In Manus ausführen

$ git log --oneline --stat

stars:3.491

forks:664

updated:3. April 2026 um 00:31

Datei-Explorer

17 Dateien

SKILL.md

readonly

name

n-agentic-harnesses

description

Design, evaluate, and improve agentic harnesses for developer tools, assistants, workflow runtimes, copilots, and AI-powered products. Use when work involves tool-use architecture, permissions, approval gates, workflow state, durability, context and memory systems, evaluation strategy, observability, operator visibility, or phased implementation plans for an AI system. Trigger when symptoms imply harness gaps too: stale context, surprising tool calls, sessions that die on crash, missing approval controls, or costs spiraling without clear visibility.

N Agentic Harnesses

Problem

Most AI products do not break because the model is too weak. They break at the harness layer: unclear tool boundaries, missing approval policy, brittle state, sloppy context assembly, no evaluation loop, and weak operator visibility. This skill turns those vague issues into concrete primitives, boundaries, phases, and checks.

Trigger Conditions

The user is designing or rebuilding an agent, assistant, copilot, or AI workflow
The request mentions harness architecture, tool-use architecture, tool registries, permission layers, approval gates, workflow state, session persistence, retries, resumability, memory, evals, observability, or multi-agent design
The user wants to evaluate an existing harness for risks, missing primitives, UX gaps, or operational weakness
The symptoms point to harness problems even if the word "harness" never appears:
- tools fire without clear permission
- sessions fail on crash or long waits
- context gets stale or bloated
- operators cannot see what happened or why
- costs, retries, or handoffs are drifting out of control

Default Posture

Bias toward lean, solo-maintainable architecture.
Start with a single-agent design unless clear constraints justify more.
Require an evaluation plan even for greenfield builds.
Prefer explicit system boundaries, permission policy, and workflow state over prompt cleverness.
Translate ideas into implementation phases, success criteria, and failure tests.

Step 0: Gather Context

Before routing, make sure you have enough to work with.

For design work, confirm:

what product or system the harness serves
what actions the agent will take
who the users are
any known constraints such as solo maintenance, existing stack, or timeline

For evaluation work, inspect the harness itself:

read the codebase, agent config, skills, hooks, or architecture docs
if evidence is missing, ask for the narrowest missing input and keep moving
do not evaluate from vibes alone

Step 1: Classify The Request

Choose one mode before reading reference files.

`design`

Use when the user is creating a new harness, planning a major rebuild, or asking for architecture, MVP shape, or implementation sequencing.

Default reads:

references/01-principles-and-solo-dev-defaults.md
references/02-harness-shapes-and-architecture.md
references/08-design-and-build-playbook.md

`evaluation`

Use when the user already has a harness and wants gaps, risks, missing primitives, UX upgrades, or architectural cleanup.

Default reads:

references/01-principles-and-solo-dev-defaults.md
references/09-evaluation-and-improvement-playbook.md

`design + evaluation`

Use when the user wants a target architecture and a way to verify it, compare it with an existing system, or define acceptance criteria before building.

Default reads:

references/01-principles-and-solo-dev-defaults.md
references/02-harness-shapes-and-architecture.md
references/08-design-and-build-playbook.md
references/09-evaluation-and-improvement-playbook.md

Step 2: Classify The Product Shape

Determine the closest product shape before going deeper:

code agent
chat assistant
workflow orchestrator
internal copilot
embedded AI product feature
hybrid system

If the request is ambiguous, pick the closest shape and state the assumption.

Step 3: Read The Smallest Useful Reference Set

Read only the files the request actually needs:

references/01-principles-and-solo-dev-defaults.md Use first for almost every request. It defines the default decision posture.
references/02-harness-shapes-and-architecture.md Read when choosing system shape, boundaries, lifecycle, transports, or deployment structure.
references/03-tools-execution-and-permissions.md Read when the request involves tool registries, tool calling, approval gates, sandboxes, or trust tiers.
references/04-state-sessions-and-durability.md Read when the request involves sessions, resumability, retries, idempotency, approval waits, or long-running work.
references/05-context-memory-and-evaluation.md Read when the request involves context windows, retrieval, memory, provenance, evals, replay tests, or regression detection.
references/06-agents-and-extensibility.md Read when the request involves multi-agent design, plugins, hooks, skills, or extension surfaces.
references/07-ux-observability-and-operations.md Read when the request involves streaming UX, health checks, logs, analytics, budgets, or supportability.
references/08-design-and-build-playbook.md Read when the user needs a build-ready plan from idea to implementation.
references/09-evaluation-and-improvement-playbook.md Read when the user needs findings, missing primitives, upgrade priorities, or acceptance tests.
references/10-example-requests-and-output-patterns.md Read when you need prompt examples or response structure examples.
references/11-codex-translation-notes.md Read only when adapting the shared skill into a Codex-oriented variant or mapping between client environments.

Do not rely on reference-to-reference chains. This file is the index.

Operating Rules

Convert vague ambitions into concrete harness primitives.
Push back on unnecessary complexity.
Treat workflow state, permissions, context assembly, and evaluation as first-class architecture, not cleanup tasks.
Separate universal harness primitives from product-specific manifestation.
For evaluation requests, present findings first and improvement sequence second.
For design requests, include how the design will be tested before calling it done.

Output Contract

For `design`

Return:

recommended harness shape
core primitives and subsystem boundaries
MVP boundary
phased implementation plan
verification and acceptance criteria

For `evaluation`

Return:

findings ordered by severity or leverage
missing or weak primitives
user experience and operational gaps
prioritized upgrade path
tests or checks that confirm the fixes

For `design + evaluation`

Return:

target architecture
comparison against current or likely failure modes
implementation phases
acceptance criteria
evaluation plan covering regressions, safety, and UX

Final Check Before Responding

Did you keep the design lean enough for a solo developer unless the request clearly demanded more?
Did you avoid recommending multi-agent coordination by default?
Did you include evaluation, not just construction?
Did you give the user an operational path forward instead of abstract theory?

related-skills.json

gleiches Repository

nbj-ob1-agent-memory-openclaw.md

from "NateBJones-Projects/OB1"

Use Nate Jones OB1 Agent Memory from OpenClaw with provenance, scope, review, and use-policy discipline.

2026-05-043.5k

nbj-ob1-agent-memory-openclaw.md

from "NateBJones-Projects/OB1"

Use Nate Jones OB1 Agent Memory from OpenClaw with provenance, scope, review, and use-policy discipline.

2026-05-043.5k

world-model-diagnostic.md

from "NateBJones-Projects/OB1"

Twenty-minute conversational diagnostic for assessing a company's world-model readiness. Use when the user wants to map their company to the right world-model paradigm, identify where the highest-fidelity signal lives, audit the boundary layer between facts and interpretation, flag simulated-judgment exposure, and leave with a first/second/third build sequence. Works in plain chat and compounds when Open Brain search/capture tools are present.

2026-04-183.5k

aiception.md

from "NateBJones-Projects/OB1"

Continuous learning system that extracts reusable knowledge from work sessions. Triggers: (1) /aiception command, (2) 'save this as a skill' or 'extract a skill from this', (3) 'what did we learn?', (4) after non-obvious debugging or trial-and-error discovery. Creates new skills when valuable reusable knowledge is identified. Integrates with Open Brain to prevent duplicates.

2026-04-163.5k

work-operating-model.md

from "NateBJones-Projects/OB1"

Conversation-first workflow for turning tacit work patterns into a structured operating model. Use when the user wants to map how their work actually runs, generate USER.md / SOUL.md / HEARTBEAT.md artifacts, or build an agent-ready model of rhythms, recurring decisions, dependencies, institutional knowledge, and friction. Requires base Open Brain search/capture tools plus the paired Work Operating Model recipe MCP tools.

2026-04-143.5k

weekly-signal-diff.md

from "NateBJones-Projects/OB1"

Use when the user wants a weekly structural diff on AI, software, or another fast-moving market. Starts from 10 suggested categories and 30 suggested companies when no watchlist exists, then adapts the scan using Open Brain memory, current priorities, and prior digests. Best for prompts like "run my weekly signal diff", "what changed this week that matters to me", "track this market", or "turn this week's news into structural shifts". Optional live search upgrade: if OpenRouter access is available, prefer the Perplexity Sonar family for fresh web-grounded retrieval with citations.

2026-04-143.5k

package.json

"author": "NateBJones-Projects"

"repository": "NateBJones-Projects/OB1"

GitHub-Repository öffnen Creator-Repositorys ansehen

$ install --global

$ download --local

In Manus ausführen

$ useful --forSOC

SoftwareentwicklerInformatik- und Mathematikberufe15-1252L4

name

n-agentic-harnesses

description

N Agentic Harnesses

Problem

Trigger Conditions

The user is designing or rebuilding an agent, assistant, copilot, or AI workflow
The request mentions harness architecture, tool-use architecture, tool registries, permission layers, approval gates, workflow state, session persistence, retries, resumability, memory, evals, observability, or multi-agent design
The user wants to evaluate an existing harness for risks, missing primitives, UX gaps, or operational weakness
The symptoms point to harness problems even if the word "harness" never appears:
- tools fire without clear permission
- sessions fail on crash or long waits
- context gets stale or bloated
- operators cannot see what happened or why
- costs, retries, or handoffs are drifting out of control

Default Posture

Bias toward lean, solo-maintainable architecture.
Start with a single-agent design unless clear constraints justify more.
Require an evaluation plan even for greenfield builds.
Prefer explicit system boundaries, permission policy, and workflow state over prompt cleverness.
Translate ideas into implementation phases, success criteria, and failure tests.

Step 0: Gather Context

Before routing, make sure you have enough to work with.

For design work, confirm:

what product or system the harness serves
what actions the agent will take
who the users are
any known constraints such as solo maintenance, existing stack, or timeline

For evaluation work, inspect the harness itself:

read the codebase, agent config, skills, hooks, or architecture docs
if evidence is missing, ask for the narrowest missing input and keep moving
do not evaluate from vibes alone

Step 1: Classify The Request

Choose one mode before reading reference files.

`design`

Use when the user is creating a new harness, planning a major rebuild, or asking for architecture, MVP shape, or implementation sequencing.

Default reads:

references/01-principles-and-solo-dev-defaults.md
references/02-harness-shapes-and-architecture.md
references/08-design-and-build-playbook.md

`evaluation`

Use when the user already has a harness and wants gaps, risks, missing primitives, UX upgrades, or architectural cleanup.

Default reads:

references/01-principles-and-solo-dev-defaults.md
references/09-evaluation-and-improvement-playbook.md

`design + evaluation`

Use when the user wants a target architecture and a way to verify it, compare it with an existing system, or define acceptance criteria before building.

Default reads:

references/01-principles-and-solo-dev-defaults.md
references/02-harness-shapes-and-architecture.md
references/08-design-and-build-playbook.md
references/09-evaluation-and-improvement-playbook.md

Step 2: Classify The Product Shape

Determine the closest product shape before going deeper:

code agent
chat assistant
workflow orchestrator
internal copilot
embedded AI product feature
hybrid system

If the request is ambiguous, pick the closest shape and state the assumption.

Step 3: Read The Smallest Useful Reference Set

Read only the files the request actually needs:

references/01-principles-and-solo-dev-defaults.md Use first for almost every request. It defines the default decision posture.
references/02-harness-shapes-and-architecture.md Read when choosing system shape, boundaries, lifecycle, transports, or deployment structure.
references/03-tools-execution-and-permissions.md Read when the request involves tool registries, tool calling, approval gates, sandboxes, or trust tiers.
references/04-state-sessions-and-durability.md Read when the request involves sessions, resumability, retries, idempotency, approval waits, or long-running work.
references/05-context-memory-and-evaluation.md Read when the request involves context windows, retrieval, memory, provenance, evals, replay tests, or regression detection.
references/06-agents-and-extensibility.md Read when the request involves multi-agent design, plugins, hooks, skills, or extension surfaces.
references/07-ux-observability-and-operations.md Read when the request involves streaming UX, health checks, logs, analytics, budgets, or supportability.
references/08-design-and-build-playbook.md Read when the user needs a build-ready plan from idea to implementation.
references/09-evaluation-and-improvement-playbook.md Read when the user needs findings, missing primitives, upgrade priorities, or acceptance tests.
references/10-example-requests-and-output-patterns.md Read when you need prompt examples or response structure examples.
references/11-codex-translation-notes.md Read only when adapting the shared skill into a Codex-oriented variant or mapping between client environments.

Do not rely on reference-to-reference chains. This file is the index.

Operating Rules

Convert vague ambitions into concrete harness primitives.
Push back on unnecessary complexity.
Treat workflow state, permissions, context assembly, and evaluation as first-class architecture, not cleanup tasks.
Separate universal harness primitives from product-specific manifestation.
For evaluation requests, present findings first and improvement sequence second.
For design requests, include how the design will be tested before calling it done.

Output Contract

For `design`

Return:

recommended harness shape
core primitives and subsystem boundaries
MVP boundary
phased implementation plan
verification and acceptance criteria

For `evaluation`

Return:

findings ordered by severity or leverage
missing or weak primitives
user experience and operational gaps
prioritized upgrade path
tests or checks that confirm the fixes

For `design + evaluation`

Return:

target architecture
comparison against current or likely failure modes
implementation phases
acceptance criteria
evaluation plan covering regressions, safety, and UX

Final Check Before Responding

Did you keep the design lean enough for a solo developer unless the request clearly demanded more?
Did you avoid recommending multi-agent coordination by default?
Did you include evaluation, not just construction?
Did you give the user an operational path forward instead of abstract theory?

n-agentic-harnesses

N Agentic Harnesses

Problem

Trigger Conditions

Default Posture

Step 0: Gather Context

Step 1: Classify The Request

design

evaluation

design + evaluation

Step 2: Classify The Product Shape

Step 3: Read The Smallest Useful Reference Set

Operating Rules

Output Contract

For design

For evaluation

For design + evaluation

Final Check Before Responding

Mehr aus diesem Repository

Mehr aus diesem Repository

N Agentic Harnesses

Problem

Trigger Conditions

Default Posture

Step 0: Gather Context

Step 1: Classify The Request

design

evaluation

design + evaluation

Step 2: Classify The Product Shape

Step 3: Read The Smallest Useful Reference Set

Operating Rules

Output Contract

For design

For evaluation

For design + evaluation

Final Check Before Responding

`design`

`evaluation`

`design + evaluation`

For `design`

For `evaluation`

For `design + evaluation`

`design`

`evaluation`

`design + evaluation`

For `design`

For `evaluation`

For `design + evaluation`