一键在 Manus 中运行任何 Skill

task-completition-evaluation

星标1

分支0

更新时间2026年3月12日 05:06

Final completion gate for VibeTeam tasks. Use at the end of implementation to verify diff quality, real testing, GitHub/Slack multi-agent communication evidence, and PR health before declaring done.

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

VibeTechnologies

VibeTechnologies/VibeTeam

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

Task Completition Evaluation

Use this skill only at the end of a task, after implementation is complete and before reporting completion.

Required Inputs

Task reference: GitHub issue/PR URL.
Code evidence: git diff/patch and changed file list.
Test evidence: command outputs and report files.
Collaboration evidence: Slack thread URL(s), GitHub issue/PR/discussion URL(s), and eval report paths.

Completion Checklist (All Required)

Review diff/patch.
- Run git status --short and git diff --stat.
- Inspect full diff for correctness, scope control, and accidental edits.
Reflect: cleanup.
- Remove dead code, debug output, TODO placeholders, and temporary files.
- Confirm no tests were weakened to force passing.
Reflect: optimality.
- Verify the issue is solved in the simplest robust way.
- Confirm docs and design alignment for behavior changes.
Run real tests (not mocked shortcuts).
- Export environment first:
  - export $( < ~/.env.d/codex.env )
  - export $( < .env )
- Run unit/integration tests relevant to the touched area.
- If agent behavior/routing/tools/eval logic changed, run at least one Slack eval.
- Fail the checklist if tests were skipped without explicit justification.
Run GitHub evaluation tests for agent collaboration.
- Use scripts/eval_github_e2e.py scenarios relevant to the task.
- Verify and report:
  - different GitHub App identities participated as different agents
  - correct existing @githubapphandle mentions were used
  - agents communicated/handoff occurred across thread(s)
  - if assignment is non-assignable for role bots, report Assignment fallback mode evidence and bot replies after trigger
Run Slack evaluation tests for agent collaboration.
- Use scripts/eval_slack_e2e.py scenarios relevant to the task.
- Verify and report:
  - different Slack app identities participated as different agents
  - correct existing @slackapphandle mentions were used
  - agents communicated/handoff occurred
  - Slack thread URL is included and manually reviewed against requirements
  - transcript evidence is copied from the generated eval report Conversation History section (real run only)
- For github_issue_pr_handoff_slack, require post-check pass for:
  - Slack required roles responded
  - Slack distinct role app identities
Create PR and verify checks.
- Create a PR linked to the issue (Fixes #<issue>).
- Confirm required GitHub checks pass before completion claim.

Minimum Command Set

# Inspect changes
git status --short
git diff --stat
git diff

# Test/eval environment
export $( < ~/.env.d/codex.env )
export $( < .env )

# Example unit tests
uv run python -m pytest tests/ -v

# Example Slack eval
uv run python scripts/eval_slack_e2e.py --scenario github_issue_pr_handoff_slack --channel C0ALG01DLJV --timeout 600

# Example GitHub eval
uv run python scripts/eval_github_e2e.py --scenario github_issue_pr_handoff_github --repo VibeTechnologies/vibeteam-eval-hello-world --pr 1 --actor-login OpenCodeEngineer --issue-role software_engineer --issue-assignee 'vibeteam-swe-bot-260301[bot]' --timeout 600

Evidence Required In Final Report

Docs referenced:
- docs/testing.md
- docs/design.md
- docs/requirements.md
Design decisions applied: decision, reason, impact.
Test results: exact commands + pass/fail summary.
Eval artifacts:
- Slack thread URL(s) and transcript summary.
- GitHub thread URL(s).
- Report file paths in results/eval_reports/.
- At least one quoted message per required role from the report transcript (no placeholders).
Clear verdict: COMPLETED only when every checklist item is satisfied.

同仓库更多 Skills

同仓库

github-apps

VibeTechnologies/VibeTeam

Create and configure role-scoped GitHub Apps for VibeTeam, map credentials to agents placeholders, and validate installation permissions/identity.

2026-03-121

github-handoff-evals

VibeTechnologies/VibeTeam

Run VibeTeam GitHub/Slack handoff validation with unit tests, Slack evals, GitHub webhook evals, and permission checks. Use when validating multi-agent GitHub communication (issues, discussions, PR comments) or when asked to prove changes via tests/evals and record status.

2026-03-121

slack-app

VibeTechnologies/VibeTeam

Create and configure VibeTeam Slack apps (one ingress app plus role-scoped responder apps), wire role tokens/secrets, and validate routing/identity behavior.

2026-03-121

knowledgebase-search

VibeTechnologies/VibeTeam

Search shared knowledgebase content using docs_tools (BM25 + fallback keyword scoring) before answering from memory.

2026-03-051

knowledgebase-search

VibeTechnologies/VibeTeam

Shared workflow for knowledgebase retrieval using docs_tools and injected OpenClaw context.

2026-03-051

knowledgebase-file-ingestion

VibeTechnologies/VibeTeam

Ingest user-provided files into the shared agents knowledgebase and verify retrieval.

2026-03-051

name	task-completition-evaluation
description	Final completion gate for VibeTeam tasks. Use at the end of implementation to verify diff quality, real testing, GitHub/Slack multi-agent communication evidence, and PR health before declaring done.

Task Completition Evaluation

Use this skill only at the end of a task, after implementation is complete and before reporting completion.

Required Inputs

Task reference: GitHub issue/PR URL.
Code evidence: git diff/patch and changed file list.
Test evidence: command outputs and report files.
Collaboration evidence: Slack thread URL(s), GitHub issue/PR/discussion URL(s), and eval report paths.

Completion Checklist (All Required)

Review diff/patch.
- Run git status --short and git diff --stat.
- Inspect full diff for correctness, scope control, and accidental edits.
Reflect: cleanup.
- Remove dead code, debug output, TODO placeholders, and temporary files.
- Confirm no tests were weakened to force passing.
Reflect: optimality.
- Verify the issue is solved in the simplest robust way.
- Confirm docs and design alignment for behavior changes.
Run real tests (not mocked shortcuts).
- Export environment first:
  - export $( < ~/.env.d/codex.env )
  - export $( < .env )
- Run unit/integration tests relevant to the touched area.
- If agent behavior/routing/tools/eval logic changed, run at least one Slack eval.
- Fail the checklist if tests were skipped without explicit justification.
Run GitHub evaluation tests for agent collaboration.
- Use scripts/eval_github_e2e.py scenarios relevant to the task.
- Verify and report:
  - different GitHub App identities participated as different agents
  - correct existing @githubapphandle mentions were used
  - agents communicated/handoff occurred across thread(s)
  - if assignment is non-assignable for role bots, report Assignment fallback mode evidence and bot replies after trigger
Run Slack evaluation tests for agent collaboration.
- Use scripts/eval_slack_e2e.py scenarios relevant to the task.
- Verify and report:
  - different Slack app identities participated as different agents
  - correct existing @slackapphandle mentions were used
  - agents communicated/handoff occurred
  - Slack thread URL is included and manually reviewed against requirements
  - transcript evidence is copied from the generated eval report Conversation History section (real run only)
- For github_issue_pr_handoff_slack, require post-check pass for:
  - Slack required roles responded
  - Slack distinct role app identities
Create PR and verify checks.
- Create a PR linked to the issue (Fixes #<issue>).
- Confirm required GitHub checks pass before completion claim.

Minimum Command Set

# Inspect changes
git status --short
git diff --stat
git diff

# Test/eval environment
export $( < ~/.env.d/codex.env )
export $( < .env )

# Example unit tests
uv run python -m pytest tests/ -v

# Example Slack eval
uv run python scripts/eval_slack_e2e.py --scenario github_issue_pr_handoff_slack --channel C0ALG01DLJV --timeout 600

# Example GitHub eval
uv run python scripts/eval_github_e2e.py --scenario github_issue_pr_handoff_github --repo VibeTechnologies/vibeteam-eval-hello-world --pr 1 --actor-login OpenCodeEngineer --issue-role software_engineer --issue-assignee 'vibeteam-swe-bot-260301[bot]' --timeout 600

Evidence Required In Final Report

Docs referenced:
- docs/testing.md
- docs/design.md
- docs/requirements.md
Design decisions applied: decision, reason, impact.
Test results: exact commands + pass/fail summary.
Eval artifacts:
- Slack thread URL(s) and transcript summary.
- GitHub thread URL(s).
- Report file paths in results/eval_reports/.
- At least one quoted message per required role from the report transcript (no placeholders).
Clear verdict: COMPLETED only when every checklist item is satisfied.