Run any Skill in Manus with one click

$pwd:

devflow-plan-critic

Name: Devflow Plan Critic
Author: KlausFreiberufler

// Use during planning state, before submitting flow_update({currentState: 'approval'}), to deeply review the plan against 7 quality dimensions. The skill enforces a critic-persona — the agent steps out of "author" mode and into "reviewer" mode. Outputs a structured verdict (approved | minor_issues | needs_changes) with severity-tagged findings. Iron Law: a verdict is only valid if all 7 dimensions were explicitly checked.

Run Skill in Manus

$ git log --oneline --stat

stars:1

forks:0

updated:May 6, 2026 at 20:39

SKILL.md

readonly

package.json

"author": "KlausFreiberufler"

"repository": "KlausFreiberufler/devflow-mcp"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name

devflow-plan-critic

description

Use during planning state, before submitting flow_update({currentState: 'approval'}), to deeply review the plan against 7 quality dimensions. The skill enforces a critic-persona — the agent steps out of "author" mode and into "reviewer" mode. Outputs a structured verdict (approved | minor_issues | needs_changes) with severity-tagged findings. Iron Law: a verdict is only valid if all 7 dimensions were explicitly checked.

Skill: devflow-plan-critic

Purpose: Deep self-critique of a plan before transitioning planning → approval. The agent puts on the reviewer-hat and challenges its own work against 7 quality dimensions.

When to use

Invoke this skill before calling flow_update({currentState: 'approval'}). The pre-tool-use hook prints a reminder, but the responsibility is on the agent: a plan submitted without critique is a plan submitted blind.

The 7 Dimensions

For each dimension, the critic answers a focused question and reports findings.

1. AC-Coverage

Does every Acceptance Criterion have at least one task that addresses it? Do tasks cover all ACs (no orphan tasks, no orphan ACs)?

Common failures:

AC mentions a behavior but no task implements it ("user can edit X" → no UI task)
Task exists for a sub-behavior not explicitly captured in any AC
Acceptance Criterion is too vague to test ("should work well")

2. Task-Granularity

Is each task atomic and fits in less than a day's effort? Can each task be reviewed independently?

Common failures:

"Implement feature X" — too coarse, should split into 3-5 tasks
"Write tests" — too vague, should split into unit + integration + e2e
One task touches >5 files in unrelated areas — likely needs splitting

3. Edge Cases

What happens at the boundaries? Empty input, concurrency, permissions, pre-existing state?

Common failures:

No mention of empty/null/zero inputs
Race conditions ignored (multi-user, multi-tab)
Permission edge unstated ("user can do X" → who specifically? what role?)
Migration/upgrade path for existing data missing

4. Wiki-Coverage

Are the relevant ADRs, patterns, and runbooks referenced in the plan? Does the plan drift from existing decisions?

Common failures:

Plan introduces a pattern that contradicts an accepted ADR
Plan reinvents something already documented as a pattern
Plan affects an area that has runbooks, but doesn't reference them
Plan extends behavior covered by an existing ADR but doesn't propose an ADR-update

5. Past-Flow-Anchor

Are there similar past flows? Is the plan anchored on what worked there?

Common failures:

Similar flow DF-X was completed last month — plan ignores its lessons
Pattern emerged in 3+ past flows — plan doesn't reuse it (violates devflow-pattern-reuse)
Past flow's actual_hours/tokens 5× the estimate — plan repeats the same scope at the same estimate

6. Architecture-Risks

Schema migrations? Cross-project impact? Backward compatibility? Security?

Common failures:

Schema change without migration strategy
Cross-project ripple (e.g. this changes devflow-mcp shape, but no plan for plugin/UI updates)
Breaking change without deprecation period
New endpoint/route without auth/permission consideration

7. Test-Strategy

Are TDD paths planned? Which tests would prove the AC?

Common failures:

"Write tests" without naming the testing levels (unit / integration / e2e)
Critical path has no test mentioned
Edge case is in the plan but no test for it
New API route without security-cross-access test

Output Format

The skill produces a structured verdict as a JSON-shaped Markdown comment:

{
  "skill": "devflow-plan-critic",
  "iteration": 1,
  "verdict": "needs_changes",
  "findings": [
    {
      "severity": "high",
      "dimension": "ac-coverage",
      "issue": "AC-2 mentions 'user can edit story points' but no UI task exists",
      "suggestion": "Add task: 'Add story-points dropdown to FlowDetailPage'"
    },
    {
      "severity": "medium",
      "dimension": "edge-cases",
      "issue": "Migration for existing flows is not described",
      "suggestion": "Add section: 'Existing flows get story_points = NULL (back-fill out of scope)'"
    }
  ],
  "ready_to_proceed": false,
  "tokens_used_estimate": 12000
}

Verdict Semantics

Verdict	When	Next
`approved`	0 high + 0 medium findings (or only `low`)	Plan is ready, proceed to approval
`approved-trivial`	Tasks ≤ 2 AND no schema-change AND no migration AND no security/breaking tag	Skip-shortcut for tiny flows
`minor_issues`	0 high + ≥1 medium findings	Plan is good but some polish recommended; user may accept as-is
`needs_changes`	≥1 high finding	Plan must be revised before approval. Fix high-findings, re-critique.

Severity Definitions

Severity	Meaning
`high`	Blocks approval. Plan is materially incomplete or incorrect on this dimension.
`medium`	Should be addressed but doesn't block. Reviewer may accept with risk noted.
`low`	Cosmetic, suggestion-only. No action required.

Process

Read the current plan (flow.implementationPlan + tasks + acceptance criteria)
For each of the 7 dimensions, ask the focused question and look for failure-modes
Record findings with severity + dimension + issue + suggestion
Compute verdict from findings (rule above)
Emit structured comment to the flow-discussion (or as agent-message)
If verdict ≠ approved, revise the plan and re-invoke this skill
If verdict = approved, proceed with flow_update({currentState: 'approval'})

Loop Semantics (Phase 2)

The critic is iterative:

Iteration 1: full pass on all 7 dimensions
If verdict ≠ approved → agent fixes → iteration 2
Iteration 2: re-check (may shortcut for unchanged dimensions)
If verdict ≠ approved → iteration 3 (final)
Hard cap: max 3 iterations
Early exit: 2 consecutive iterations with 0 high-findings → exit
No-improvement detection: same finding-hash appears twice unchanged → escalate, don't iterate

If max-iter reached and findings remain: emit verdict: needs_changes, set ready_to_proceed: false, surface to user. User decides: manual fix or override (e.g. /skip-review slash command).

Skip-Rules (Phase 5) — Trivial Flow Detection

For trivial flows, the critic MAY emit approved-trivial early:

tasks_count ≤ 2 AND
No schema migration in plan AND
No new API endpoint introduced AND
Tags not in force_critic_tags (project-config: default ['security','breaking'])

Output:

{
  "verdict": "approved-trivial",
  "skip_reason": "1 task, no schema change, no new endpoint",
  "dimensions_checked": ["ac-coverage", "task-granularity"],
  "ready_to_proceed": true
}

Posting Findings (Phase 4) — Best-effort Discussion Integration

The critic's output should be visible to the user. Until flow_comments_create exists in MCP (deferred), use:

flow_update({
  flowId,
  planFeedback: <findings-summary-markdown>,
  // or: agentMessage: <short-status>
})

This makes findings visible in the FlowDetailPage right pane (Talk tab via plan_feedback).

Auto-Approve Gate (Phase 5)

After loop completes:

`project_configs.allow_agent_self_approval`	Behavior
`true` (ON)	Verdict=approved → emit discipline-tokens via `devflow_token_emit` → proceed to `approval` (and through to `ready` via existing self-approval hook)
`false` (OFF)	Verdict=approved → write planFeedback summary → halt, await user manual click

Discipline Token

Emit devflow-plan-critic token only when verdict = approved (or approved-trivial) AND all 7 dimensions were explicitly checked (or skip-rules applied). Token includes evidence:

{
  "dimensions_checked": ["ac-coverage", "task-granularity", "edge-cases", "wiki-coverage", "past-flow-anchor", "arch-risks", "test-strategy"],
  "iterations": 2,
  "final_verdict": "approved",
  "high_findings_resolved": 3,
  "medium_findings_open": 1
}

Anti-Patterns

Rubber-stamping — emitting approved without actually checking each dimension
Inventing AC — the critic must not add new requirements; only assess what's there
Loop forever — same finding 2× without progress → escalate, don't iterate further
Stockholm-syndrome — being too lenient on your own plan because you wrote it. Switch persona explicitly: "if a stranger handed me this plan, what would I object to?"

devflow-receiving-review — handles human reviewer feedback
devflow-knowledge-completer — pairs with dimension 4 (Wiki-Coverage)
devflow-pattern-reuse — pairs with dimension 5 (Past-Flow-Anchor)
DF-339 — flow that introduced this skill
Phase 2: loop semantics with max-3 iterations (deferred)
Phase 3: companion skill devflow-code-critic for in_progress → review

name

devflow-plan-critic

description

Skill: devflow-plan-critic

Purpose: Deep self-critique of a plan before transitioning planning → approval. The agent puts on the reviewer-hat and challenges its own work against 7 quality dimensions.

When to use

The 7 Dimensions

For each dimension, the critic answers a focused question and reports findings.

1. AC-Coverage

Does every Acceptance Criterion have at least one task that addresses it? Do tasks cover all ACs (no orphan tasks, no orphan ACs)?

Common failures:

AC mentions a behavior but no task implements it ("user can edit X" → no UI task)
Task exists for a sub-behavior not explicitly captured in any AC
Acceptance Criterion is too vague to test ("should work well")

2. Task-Granularity

Is each task atomic and fits in less than a day's effort? Can each task be reviewed independently?

Common failures:

"Implement feature X" — too coarse, should split into 3-5 tasks
"Write tests" — too vague, should split into unit + integration + e2e
One task touches >5 files in unrelated areas — likely needs splitting

3. Edge Cases

What happens at the boundaries? Empty input, concurrency, permissions, pre-existing state?

Common failures:

No mention of empty/null/zero inputs
Race conditions ignored (multi-user, multi-tab)
Permission edge unstated ("user can do X" → who specifically? what role?)
Migration/upgrade path for existing data missing

4. Wiki-Coverage

Are the relevant ADRs, patterns, and runbooks referenced in the plan? Does the plan drift from existing decisions?

Common failures:

Plan introduces a pattern that contradicts an accepted ADR
Plan reinvents something already documented as a pattern
Plan affects an area that has runbooks, but doesn't reference them
Plan extends behavior covered by an existing ADR but doesn't propose an ADR-update

5. Past-Flow-Anchor

Are there similar past flows? Is the plan anchored on what worked there?

Common failures:

Similar flow DF-X was completed last month — plan ignores its lessons
Pattern emerged in 3+ past flows — plan doesn't reuse it (violates devflow-pattern-reuse)
Past flow's actual_hours/tokens 5× the estimate — plan repeats the same scope at the same estimate

6. Architecture-Risks

Schema migrations? Cross-project impact? Backward compatibility? Security?

Common failures:

Schema change without migration strategy
Cross-project ripple (e.g. this changes devflow-mcp shape, but no plan for plugin/UI updates)
Breaking change without deprecation period
New endpoint/route without auth/permission consideration

7. Test-Strategy

Are TDD paths planned? Which tests would prove the AC?

Common failures:

"Write tests" without naming the testing levels (unit / integration / e2e)
Critical path has no test mentioned
Edge case is in the plan but no test for it
New API route without security-cross-access test

Output Format

The skill produces a structured verdict as a JSON-shaped Markdown comment:

{
  "skill": "devflow-plan-critic",
  "iteration": 1,
  "verdict": "needs_changes",
  "findings": [
    {
      "severity": "high",
      "dimension": "ac-coverage",
      "issue": "AC-2 mentions 'user can edit story points' but no UI task exists",
      "suggestion": "Add task: 'Add story-points dropdown to FlowDetailPage'"
    },
    {
      "severity": "medium",
      "dimension": "edge-cases",
      "issue": "Migration for existing flows is not described",
      "suggestion": "Add section: 'Existing flows get story_points = NULL (back-fill out of scope)'"
    }
  ],
  "ready_to_proceed": false,
  "tokens_used_estimate": 12000
}

Verdict Semantics

Verdict	When	Next
`approved`	0 high + 0 medium findings (or only `low`)	Plan is ready, proceed to approval
`approved-trivial`	Tasks ≤ 2 AND no schema-change AND no migration AND no security/breaking tag	Skip-shortcut for tiny flows
`minor_issues`	0 high + ≥1 medium findings	Plan is good but some polish recommended; user may accept as-is
`needs_changes`	≥1 high finding	Plan must be revised before approval. Fix high-findings, re-critique.

Severity Definitions

Severity	Meaning
`high`	Blocks approval. Plan is materially incomplete or incorrect on this dimension.
`medium`	Should be addressed but doesn't block. Reviewer may accept with risk noted.
`low`	Cosmetic, suggestion-only. No action required.

Process

Read the current plan (flow.implementationPlan + tasks + acceptance criteria)
For each of the 7 dimensions, ask the focused question and look for failure-modes
Record findings with severity + dimension + issue + suggestion
Compute verdict from findings (rule above)
Emit structured comment to the flow-discussion (or as agent-message)
If verdict ≠ approved, revise the plan and re-invoke this skill
If verdict = approved, proceed with flow_update({currentState: 'approval'})

Loop Semantics (Phase 2)

The critic is iterative:

Iteration 1: full pass on all 7 dimensions
If verdict ≠ approved → agent fixes → iteration 2
Iteration 2: re-check (may shortcut for unchanged dimensions)
If verdict ≠ approved → iteration 3 (final)
Hard cap: max 3 iterations
Early exit: 2 consecutive iterations with 0 high-findings → exit
No-improvement detection: same finding-hash appears twice unchanged → escalate, don't iterate

If max-iter reached and findings remain: emit verdict: needs_changes, set ready_to_proceed: false, surface to user. User decides: manual fix or override (e.g. /skip-review slash command).

Skip-Rules (Phase 5) — Trivial Flow Detection

For trivial flows, the critic MAY emit approved-trivial early:

tasks_count ≤ 2 AND
No schema migration in plan AND
No new API endpoint introduced AND
Tags not in force_critic_tags (project-config: default ['security','breaking'])

Output:

{
  "verdict": "approved-trivial",
  "skip_reason": "1 task, no schema change, no new endpoint",
  "dimensions_checked": ["ac-coverage", "task-granularity"],
  "ready_to_proceed": true
}

Posting Findings (Phase 4) — Best-effort Discussion Integration

The critic's output should be visible to the user. Until flow_comments_create exists in MCP (deferred), use:

flow_update({
  flowId,
  planFeedback: <findings-summary-markdown>,
  // or: agentMessage: <short-status>
})

This makes findings visible in the FlowDetailPage right pane (Talk tab via plan_feedback).

Auto-Approve Gate (Phase 5)

After loop completes:

`project_configs.allow_agent_self_approval`	Behavior
`true` (ON)	Verdict=approved → emit discipline-tokens via `devflow_token_emit` → proceed to `approval` (and through to `ready` via existing self-approval hook)
`false` (OFF)	Verdict=approved → write planFeedback summary → halt, await user manual click

Discipline Token

Emit devflow-plan-critic token only when verdict = approved (or approved-trivial) AND all 7 dimensions were explicitly checked (or skip-rules applied). Token includes evidence:

{
  "dimensions_checked": ["ac-coverage", "task-granularity", "edge-cases", "wiki-coverage", "past-flow-anchor", "arch-risks", "test-strategy"],
  "iterations": 2,
  "final_verdict": "approved",
  "high_findings_resolved": 3,
  "medium_findings_open": 1
}

Anti-Patterns

Rubber-stamping — emitting approved without actually checking each dimension
Inventing AC — the critic must not add new requirements; only assess what's there
Loop forever — same finding 2× without progress → escalate, don't iterate further
Stockholm-syndrome — being too lenient on your own plan because you wrote it. Switch persona explicitly: "if a stranger handed me this plan, what would I object to?"

devflow-receiving-review — handles human reviewer feedback
devflow-knowledge-completer — pairs with dimension 4 (Wiki-Coverage)
devflow-pattern-reuse — pairs with dimension 5 (Past-Flow-Anchor)
DF-339 — flow that introduced this skill
Phase 2: loop semantics with max-3 iterations (deferred)
Phase 3: companion skill devflow-code-critic for in_progress → review

devflow-plan-critic

Skill: devflow-plan-critic

When to use

The 7 Dimensions

1. AC-Coverage

2. Task-Granularity

3. Edge Cases

4. Wiki-Coverage

5. Past-Flow-Anchor

6. Architecture-Risks

7. Test-Strategy

Output Format

Verdict Semantics

Severity Definitions

Process

Loop Semantics (Phase 2)

Skip-Rules (Phase 5) — Trivial Flow Detection

Posting Findings (Phase 4) — Best-effort Discussion Integration

Auto-Approve Gate (Phase 5)

Discipline Token

Anti-Patterns

Related

Skill: devflow-plan-critic

When to use

The 7 Dimensions

1. AC-Coverage

2. Task-Granularity

3. Edge Cases

4. Wiki-Coverage

5. Past-Flow-Anchor

6. Architecture-Risks

7. Test-Strategy

Output Format

Verdict Semantics

Severity Definitions

Process

Loop Semantics (Phase 2)

Skip-Rules (Phase 5) — Trivial Flow Detection

Posting Findings (Phase 4) — Best-effort Discussion Integration

Auto-Approve Gate (Phase 5)

Discipline Token

Anti-Patterns

Related