Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

$pwd:

agent-output-audit

Name: Agent Output Audit
Author: pedronauck

// Audits AI-implemented work for honest completion. Runs independent-evaluator checks against task artifacts, transcripts, tests, CI evidence, requirement-to-test mapping, status front matter, and quality gates; flags skipped tests, weakened assertions, mock-only confidence, snapshot drift, happy-path-only coverage, flaky retries, and status/evidence mismatches. Use when validating completed Compozy tasks, AI-authored PRs, or codex-loop iterations. Do not use for real-user QA, persona/journey testing, exploratory charters, or product usability sessions; use qa-execution for those.

In Manus ausführen

$ git log --oneline --stat

stars:384

forks:65

updated:12. Mai 2026 um 16:26

Datei-Explorer

10 Dateien

SKILL.md

readonly

related-skills.json

gleiches Repository

yc-apply.md

from "pedronauck/skills"

Guides a founder through the full Y Combinator batch application end-to-end. A 10-phase workflow that captures the live YC form, profiles the founders, stress-tests the idea via an embedded grill loop, runs a mandatory 5-agent parallel external research pass on the startup, drafts every form field with anti-pattern and accepted-example checks, produces founder-video bullet notes (no script), runs a final adversarial gate, generates paste-ready submission answers, unlocks an interview-prep simulator after invite, and supports reapplicant delta tracking and post-decision post-mortems. Writes a documented markdown trail under a user-chosen workspace. Use when a founder wants to prepare a YC batch application, build their founder video, drill mock YC interview questions, or reapply with delta evidence. Don't use for pitch-deck design unrelated to YC, generic startup advice without applying, or post-funding work.

2026-05-25384

kb-yt-channel.md

from "pedronauck/skills"

Creates and maintains Knowledge Base topics from YouTube channels by resolving recent or full uploads, scaffolding yt-channels topics, ingesting transcripts through kb ingest youtube, and validating/indexing the result. Use when turning a YouTube channel into a Karpathy KB topic. Do not use for single-video ingestion, general video summaries, or non-YouTube sources.

2026-05-20384

writing-tech-post.md

from "pedronauck/skills"

Authors engineering blog posts end-to-end: launch deep-dives, incident postmortems, architecture migrations, performance case studies, tutorials, AI/agent system writeups, security disclosures, and research-to-product translations. Picks the correct archetype, plans the abstraction ladder, enforces an evidence cadence (diagrams, benchmarks, profiles, traces, code, ablations), tunes voice against publisher house styles (Datadog, Vercel, GitHub, AWS, Meta, Cloudflare, Jane Street), and runs a pre-publish gate for narrative momentum and disclosure ethics. Use when drafting a new engineering post, restructuring a draft that feels flat, deciding which evidence form belongs where, validating that depth and product context are balanced, or preparing a postmortem, migration, or performance narrative for external publication. Do not use for API reference documentation, README authoring, marketing copy, release notes, generic SEO content, ghost-written executive thought leadership, or non-engineering long-form essays.

2026-05-20384

agent-exploration.md

from "pedronauck/skills"

Dispatches scoped-write explorer agents in parallel for general research and exploration of any codebase, topic, or domain. The operator passes --path (output directory), --agents (parallel count), --prompt (research question), and optionally --ide, --model, --reasoning to control the Compozy runtime. The parent scouts the territory first to divide work, then invokes `compozy exec --agent explorer` N times in parallel (via the harness's async/background facility); each invocation writes one analysis file at <path>/analysis/NN_analysis_<slug>.md following a seven-section schema. The parent then synthesizes <path>/analysis/summary.md. The explorer agent lives in the Compozy global registry at ~/.compozy/agents/explorer/AGENT.md and is installed by the bundled script when absent. Use when running parallel multi-area research that must produce written artifacts. Do not use for competitor-only research already covered by cy-research-competitors, single-file lookups answerable by Explore, or edits to existing code.

2026-05-13384

app-renderer-systems.md

from "pedronauck/skills"

Guides creation and modification of domain feature systems organized under a systems/ directory. Covers directory layout, API service layer patterns, TanStack Query hooks (queries, mutations, optimistic updates), React context and XState store conventions, hook organization, and public API barrel exports. Use when adding a new domain system, extending an existing one, or fixing bugs in a system-layer codebase. Don't use for generic React component work, backend API implementation, or codebases not organized around a systems/ domain pattern.

2026-05-12384

architectural-analysis.md

from "pedronauck/skills"

Deep architectural audit focused on finding dead code, duplicated functionality, architectural anti-patterns, type confusion, and code smells. Use when user asks for architectural analysis, find dead code, identify duplication, or assess codebase health. Don't use for style/formatting issues, performance profiling, security audits, or feature-level code review.

2026-05-12384

package.json

"author": "pedronauck"

"repository": "pedronauck/skills"

GitHub-Repository öffnen Creator-Repositorys ansehen

$ install --global

$ download --local

In Manus ausführen

$ useful --forSOC

Softwarequalitätssicherungsanalysten und -testerInformatik- und Mathematikberufe15-1253L4

name	agent-output-audit
description	Audits AI-implemented work for honest completion. Runs independent-evaluator checks against task artifacts, transcripts, tests, CI evidence, requirement-to-test mapping, status front matter, and quality gates; flags skipped tests, weakened assertions, mock-only confidence, snapshot drift, happy-path-only coverage, flaky retries, and status/evidence mismatches. Use when validating completed Compozy tasks, AI-authored PRs, or codex-loop iterations. Do not use for real-user QA, persona/journey testing, exploratory charters, or product usability sessions; use qa-execution for those.
argument-hint	[audit-output-path]
metadata	{"author":"Pedro Nauck","github":"https://github.com/pedronauck","repository":"https://github.com/pedronauck/skills"}

Agent Output Audit

Independent verification of AI-implemented work. The skill that asks: "Did the implementing agent actually do what task_NN.md says it did?" — not "Would a real user succeed at this product?" (that's qa-execution).

Required Reading Router

Match your task to the row. Read the listed files in full before producing output. They are not appendices — they are load-bearing. Inline content in this SKILL.md is a pointer, not a substitute.

Task	MUST read
Discovering install/lint/test/build/start commands (Step 1)	`references/project-signals.md`
Deciding E2E support and classifying coverage (Step 1)	`references/e2e-coverage.md`
Building the audit scope checklist (Step 2)	`references/checklist.md`
Holding independent-evaluator stance on AI tasks (Step 3)	`references/independent-evaluator-protocol.md`
Scanning test diffs for AI hygiene red flags (Step 4)	`references/ai-implementation-audit.md`
Diagnosing a test that passed on retry without a code change	`references/flaky-triage.md`

Reference Index

references/project-signals.md — Heuristics for picking install/lint/test/build/start commands across ecosystems when the repo lacks an umbrella gate.
references/e2e-coverage.md — Taxonomy for existing-e2e / needs-e2e / manual-only / blocked and how to detect harness support.
references/checklist.md — Audit checklist by category: contract discovery, baseline, task audit, AI hygiene, flaky detection, quality gates.
references/ai-implementation-audit.md — Red Flag scanners (RF-1..RF-6), Requirement→Test mapping, verdict matrix for completed tasks.
references/independent-evaluator-protocol.md — What counts (and doesn't count) as evidence; transcript classification (genuine-failure / grader-bug / ambiguous-task / bypass-exploit).
references/flaky-triage.md — Taxonomy, diagnosis protocol, and quarantine workflow for retry-passes-without-code-change failures.

Required Inputs

audit-output-path (optional): Directory where audit artifacts (bugs, audit report, evidence) are stored. When provided, create the directory if it does not exist and use it for all audit outputs. When omitted, fall back to repository conventions or /tmp/agent-output-audit-<slug>.

Procedures

Step 1: Discover the Repository Verification Contract

Read root instructions, repository docs, and CI/build files before running commands.
Execute python3 scripts/discover-project-contract.py --root . to surface candidate install, verify, build, test, lint, start commands, and E2E signals.
STOP. Read references/project-signals.md in full before picking commands when discovery surfaces more than one plausible gate or the repo mixes ecosystems.
STOP. Read references/e2e-coverage.md in full before classifying any flow.
Prefer repository-defined umbrella commands such as make verify, just verify, or CI entrypoints over language-default commands.
Resolve the audit artifact directory. If the user provided an audit-output-path argument, use it. Otherwise use repository conventions, falling back to /tmp/agent-output-audit-<slug>. Create the audit/ subdirectory; store all bugs and reports under <audit-output-path>/audit/.
Detect Compozy mode. If .compozy/tasks/<slug>/ exists, record the slug and switch into Compozy-aware audit:
- Read state.yaml (read-only — never write to it; scripts/update-state.py owns mutation per the cy-codex-loop contract).
- Read _techspec.md (deliverable source of truth) and _tasks.md (task roster) when present.
- List every task_NN.md and capture its frontmatter status: value (allowed: pending, in_progress, completed). When task_NN.md frontmatter disagrees with state.yaml, treat frontmatter as the source of truth.
- Note the canonical memory slot .compozy/tasks/<slug>/memory/qa-execution.md — Step 4 writes audit notes there before any status flips.

Step 2: Run the Baseline Verification Gate

Install dependencies with the repository-preferred command.
Run the canonical verification gate once before any audit work. Execute in fastest-first order: lint and type-check, then build, then unit tests, then integration tests.
If the E2E command is separate from the umbrella gate, decide whether to run it now or after runtime prerequisites are ready, then record that plan explicitly.
If the baseline fails, read the first failing output carefully and determine whether it is pre-existing or introduced by current work before moving on. 4a. Flaky-failure protocol. When a baseline command fails, before classifying as pre-existing or new, run the failing test in isolation 3-5 times on the same SHA. If it passes at least once without code changes, classify as flaky-suspect, record in audit-report.md under SUITE HEALTH SNAPSHOT (test name, attempts, retry outcome, suspected category), and do NOT promote to PASS via retry. STOP. Read references/flaky-triage.md in full before assigning a suspected category or proposing a quarantine.

Step 3: Audit Task Implementations (Compozy mode and any AI-implemented tasks)

Skip this step only when no task, phase, PRD, tech spec, or implementation-plan artifacts exist.

STOP. Read references/independent-evaluator-protocol.md in full before forming any task verdict. Tripwire summary: never accept the implementing agent's transcript, success message, or memory note as evidence. In Compozy mode, read the implementing agent's .compozy/tasks/<slug>/memory/<phase>.md artifacts and classify anomalies (genuine-failure / grader-bug / ambiguous-task / bypass-exploit) in the Errors / Corrections section of memory/qa-execution.md before judging the task.
Read each task_NN.md and its body. Summarize each task into a Task Implementation Matrix (column names mirror cy-codex-loop frontmatter):
- task_path (e.g., .compozy/tasks/<slug>/task_07.md)
- declared_status — literal frontmatter status: value
- title, type, complexity, dependencies — mirrored from frontmatter
- techspec_deliverable — linked section in _techspec.md when present
- Requirements, subtasks, checklist items, success criteria, dependent files
- implementation_evidence — files, modules, routes, commands, migrations, seeds, tests
- verification_evidence — commands executed, exit codes, output summaries
- qa_verdict — PASS | PARTIAL | FAIL | REOPEN | BLOCKED (distinct from declared_status)
- ai_audit_findings — red flag IDs that fired in Step 4 with verdict
- action — none | fixed | reopened-frontmatter | BUG-NNN.md filed
- linked_bugs — BUG IDs
Do not treat a task declared_status, checked checkbox, memory note, or prior agent summary as proof. Verify every completed or claimed-complete task against actual files, public behavior, automated tests, and acceptance criteria.
Classify each task with qa_verdict:
- PASS: every material requirement and success criterion has implementation and fresh verification evidence.
- PARTIAL: implementation exists but one or more non-critical requirements, tests, or evidence are missing.
- FAIL: claimed behavior does not work or a critical requirement is absent.
- REOPEN: the source task_NN.md has status: completed in frontmatter but the QA verdict is PARTIAL or FAIL.
- BLOCKED: audit cannot continue because a concrete prerequisite is missing.

Step 4: AI Test-Hygiene Scan (RF-1..RF-6)

STOP. Read references/ai-implementation-audit.md in full before scanning the test diff of any task with declared_status: completed. That file owns the Red Flag scanners (RF-1..RF-6), the Requirement→Test mapping rules, and the verdict matrix.
Run the scans against the diff since the task baseline (git log --follow <test_file>, git diff <baseline_sha>..HEAD).
Emit verdict FAIL automatically when scanners detect:
- Weakened assertions on P0/P1 Success Criterion (RF-2).
- .skip / .only / xit / t.Skip inserted in the diff (RF-1).
- Mocks inserted in tests whose corresponding TC declared External Dependencies as Integration/E2E (RF-3).
- Snapshot drift on P0/P1 with no requirement-change justification (RF-4).
Record findings in the Task Implementation Matrix column ai_audit_findings and in the per-task block of audit-report.md.
Apply the Requirement → Test mapping table from references/ai-implementation-audit.md. For every Success Criterion in task_NN.md (frontmatter or body) and every linked bullet in _techspec.md, find the corresponding test by name, reference, or assertion content. Mark each criterion covers / weak / missing. A checked item or status: completed without a covers row is an audit failure.

Step 5: Reopen, File Bugs, Write Memory

Mark incomplete completed tasks as REOPEN in the matrix.
In Compozy mode, write audit notes to .compozy/tasks/<slug>/memory/qa-execution.md using the canonical sections required by cy-codex-loop: Objective Snapshot, Important Decisions, Learnings, Files / Surfaces, Errors / Corrections, Ready for Next Run. This file must be written before any task_NN.md frontmatter is flipped (memory-precedes-status invariant).
Edit the offending task_NN.md frontmatter status: back to pending (or in_progress if salvageable). Never write to state.yaml — cy-codex-loop's update-state.py owns mutation; frontmatter wins because the next iteration reconciles from it.
File BUG-<num>.md under <audit-output-path>/audit/issues/ using assets/issue-template.md. Include:
- The task path under Reopens task:.
- The failed Success Criterion under Summary:.
- The original strict assertion (when RF-2 fired) under Root cause:.
- The red flag ID and verdict under Automation Follow-up: notes.
- The transcript anomaly classification (when applicable) under Related:.
When the missing work is a bounded root-cause fix inside the audit scope, you may implement it, add regression coverage, and rerun the task proof. Otherwise reopen the task — do not silently pass it.

Step 6: Quality Gates Verdict

Re-run the canonical verification gate from scratch after the last code change made during the audit.
Compile the Quality Gates section of audit-report.md. Each gate is PASS / FAIL / N/A:
- Flaky rate <2% in canonical suite.
- Zero FAIL from AI test-hygiene audit on P0/P1 tasks.
- Zero Critical / High issues open.
- Coverage delta ≥ baseline (no regression).
- Zero unresolved flaky-suspect on P0 flows.
A FAIL on any gate blocks an unconditional PASS verdict for the run.

Step 7: Write the Audit Report

Summarize the audit using assets/audit-report-template.md and write the report to <audit-output-path>/audit/audit-report.md.
Mandatory sections:
- Claim / Command / Exit code / Verdict per command executed in Step 2 and Step 6.
- AUTOMATED COVERAGE — support detected, harness, canonical command, required flows with classification, specs added or updated.
- TASK IMPLEMENTATION AUDIT — Compozy slug, plan sources, matrix totals, per-task verdicts, reopened/fixed/blocked tasks, links to bugs.
- SUITE HEALTH SNAPSHOT — flaky rate, flaky events list, mutation score (when harness exists), coverage delta vs baseline, blocked count, manual-only count, AI audit findings count.
- QUALITY GATES — PASS/FAIL/N/A per gate.
- ISSUES FILED — total, by severity, with Reopens task: annotations.
When running in a Compozy slug, the final audit-report.md PASS feeds cy-codex-loop's verify.last_status=PASS precondition for Phase E — do not call update-state.py; cy-codex-loop owns that mutation.
Report blocked scenarios, missing credentials, or environment gaps with the exact command or prerequisite that stopped execution.

Error Handling

If command discovery returns multiple plausible gates, prefer the broadest repository-defined command and explain the tie-breaker.
If E2E support signals are weak or contradictory, prefer explicit config files and runnable commands before claiming the repository supports E2E.
If no canonical verify command exists, read references/project-signals.md, choose the broadest safe install, lint, test, and build commands for the detected ecosystem, and state that assumption explicitly.
If a required live dependency is unavailable, validate every local boundary that does not require the missing dependency and report the blocked live validation separately.
If a failure appears unrelated to the audited tasks, prove that with a clean reproduction before excluding it from the audit scope.
If the repository has an E2E harness but credentials, runtime services, or test data prevent execution, keep the affected flow classified as blocked and report the exact prerequisite that is missing.
If task_NN.md files are marked status: completed but contain unchecked subtasks, missing deliverables, or unverified criteria, do not call the audit a pass. Write memory/qa-execution.md first, then edit frontmatter status: back to pending or in_progress, and file BUG-<num>.md per Step 5. Never write to state.yaml.
If a test fails and passes on retry without a code change, do not promote to PASS. Register as flaky-suspect per references/flaky-triage.md, record the event in the Suite Health Snapshot, and treat any unresolved flaky-suspect on a P0 flow as a blocker for the final verdict.
If the AI test-hygiene scan (Step 4) detects weakened assertions, skipped tests, or mocks hiding integration in a task with declared_status: completed, do not call the audit a pass. Apply the verdict matrix in references/ai-implementation-audit.md, file BUG-<num>.md with Type Functional, and flip frontmatter status: per Step 5.

Companion: qa-execution

agent-output-audit validates that the implementing AI agent did what it claimed. qa-execution validates that a real human user can succeed at the product. They are complementary, not redundant:

Run agent-output-audit to certify that task_NN.md status: completed reflects real work.
Run qa-execution to certify that the product, taken as a whole, is acceptable to end users.

A Compozy slug typically wants both: audit the task implementations, then exercise the resulting product through user-flow QA. They share no output directory, no bug taxonomy, and no procedures — keep them separate.

agent-output-audit

Mehr aus diesem Repository

Mehr aus diesem Repository

Agent Output Audit

Required Reading Router

Reference Index

Required Inputs

Procedures

Error Handling

Companion: qa-execution

Agent Output Audit

Required Reading Router

Reference Index

Required Inputs

Procedures

Error Handling

Companion: qa-execution