| name | hypo-workflow |
| version | 12.3.0 |
| description | Run a serialized prompt execution pipeline from a local `.pipeline/` workspace. Use this skill whenever the user says "开始执行", "继续 pipeline", "执行下一步", "pipeline status", "跳过当前步骤", "skip step", "中止", "abort", or invokes `/hw:start`, `/hw:resume`, `/hw:status`, `/hw:skip`, `/hw:stop`, `/hw:report`, `/hw:chat`, `/hw:plan`, `/hw:plan:extend`, `/hw:plan:review`, `/hw:cycle`, `/hw:accept`, `/hw:reject`, `/hw:explore`, `/hw:sync`, `/hw:docs`, `/hw:patch`, `/hw:pr`, `/hw:pr create`, `/hw:explain`, `/hw:compact`, `/hw:knowledge`, `/hw:guide`, `/hw:showcase`, `/hw:rules`, `/hw:init`, `/hw:check`, `/hw:audit`, `/hw:release`, `/hw:debug`, `/hw:help`, `/hw:reset`, or `/hw:log`. |
Hypo-Workflow v12.3.0
Claude Code 用户:请使用 /hypo-workflow:<command> 调用具体指令。输入 /hypo-workflow:help 查看全部 39 个用户指令。
Codex 用户:本文件是完整的 Skill 入口,继续使用 /hw:* 指令。
Commands
| Command | Description |
|---|
/hw:start | Initialize and start the pipeline from the first prompt |
/hw:resume | Resume from the last interrupted state |
/hw:status | Show current pipeline progress; use --full to bypass compact context |
/hw:skip | Skip the current prompt and advance |
/hw:stop | Gracefully stop and save state |
/hw:report | Show compact report summaries, latest scores, or --view <M> full report |
/hw:chat | Enter lightweight append conversation mode |
/hw:plan | Enter Plan Mode through plan/PLAN-SKILL.md |
/hw:plan:discover | Run the Discover phase of Plan Mode |
/hw:plan:decompose | Run the Decompose phase of Plan Mode |
/hw:plan:generate | Run the Generate phase of Plan Mode |
/hw:plan:confirm | Run the Confirm phase of Plan Mode |
/hw:plan:extend | Append milestones to an active Cycle |
/hw:plan:review | Run Plan Review for the current or all milestones |
/hw:cycle | Create, list, view, close, and archive delivery Cycles |
/hw:accept | Accept pending Cycle work and complete the manual acceptance gate |
/hw:reject | Reject pending Cycle work with structured feedback and reopen the Cycle |
/hw:explore | Start an isolated exploration worktree and record exploration metadata |
/hw:sync | Synchronize project adapters and lightweight derived context without executing pipeline milestones |
/hw:docs | Generate, check, repair, and sync documentation |
/hw:patch | Create, list, close, and fix persistent lightweight Patches |
/hw:patch fix | Execute the lightweight six-step Patch repair lane |
/hw:pr | Inspect, review, fix, merge, or close existing GitHub PRs / GitLab MRs through local archives |
/hw:pr create | Guide GitHub PR / GitLab MR creation from existing worktree changes or a plan-first PR-sized task |
/hw:explain | Answer code, config, command, or recent-change questions with cited local evidence |
/hw:compact | Generate .compact context views for large runtime files |
/hw:knowledge | Inspect Knowledge Ledger records, indexes, compact summaries, and secret references |
/hw:guide | Start an interactive guide that recommends the next command path |
/hw:showcase | Generate project intro docs, technical docs, slides, and an optional poster |
/hw:rules | Manage rule severities, custom natural-language rules, lifecycle hooks, and rule packs |
/hw:init | Initialize or rescan .pipeline/ with architecture-aware project discovery |
/hw:check | Run pipeline health checks for config, state, prompts, Notion, and architecture |
/hw:audit | Run preventive code audits and emit graded findings with report output |
/hw:release | Run the automated release flow with regression, versioning, changelog, and git steps |
/hw:debug | Run symptom-driven debugging with hypotheses, validation, and optional auto-fix |
/hw:help | Show command help, grouped quick reference, or per-command usage |
/hw:reset | Reset pipeline runtime state with safe, full, or hard modes |
/hw:log | Read the unified lifecycle log; use --full to bypass compact log context |
/hw:setup | Create or update ~/.hypo-workflow/config.yaml for environment, execution, subagent, plan, and dashboard defaults |
Internal runtime skill: /hw:watchdog is cron-only and should not be presented as a normal user command.
When the user types any /hw:* command, execute the corresponding action.
Unrecognized /hw:* commands should be reported as unknown.
Load references/commands-spec.md when you need parsing rules, parameter semantics, or state-mutation details for slash commands.
Compatibility alias: /hw:review now prints ⚠️ \/hw:review` 已迁移到 `/hw:plan:review`。请使用新命令。`
Output Language Rules
📌 输出语言规则:
读取 config.yaml → output.language
- zh-CN / zh:所有用户可见的输出使用中文(PROGRESS、报告、状态提示、错误消息、交互提问)
- en:使用英文
- auto:跟随用户对话语言
内部日志(log.yaml、state.yaml)始终英文。
Template loading maps zh-CN / zh to templates/zh/, maps en / en-US to templates/en/, and falls back to root templates/ when the localized template is missing.
Plan Tool Discipline
The plan-tool-required rule applies to complex tasks and planning work:
- OpenCode: use native
todowrite for visible plan state and question / Ask for required user decisions.
- Codex: use the available plan/update tool when present; otherwise keep an explicit checklist in the conversation.
- Claude Code: keep an explicit plan/checkpoint list in the conversation or configured planning surface.
- P1/P2/P3/P4 checkpoints must update the visible plan before moving to the next phase.
Prompt Pipeline
Use this skill to execute one prompt at a time from a project-local .pipeline/ directory.
V2.5 is a structural upgrade:
- keep the same pipeline behavior as V1
- move detailed specs into
references/
- move reusable shell helpers into
scripts/
- move stable templates into
assets/
- expose Claude plugin packaging through
.claude-plugin/plugin.json
The runtime guarantees in this version focus on:
pipeline.source: local | notion
pipeline.output: local | notion
execution.mode: self | subagent
- recovery through
.pipeline/state.yaml
If the configuration asks for a capability the current version does not support, stop and say so explicitly.
Progressive Disclosure
Load the skill in three layers:
- metadata from this file frontmatter
- this
SKILL.md body for core runtime behavior
- bundled resources on demand:
references/ for detailed policy
assets/ for stable templates and examples
scripts/ for deterministic helper tasks
Prefer not to inline long policy text into the main conversation when a bundled file already defines it.
Plan Sub-Skill
Plan Mode is implemented as a dedicated sub-skill:
plan/PLAN-SKILL.md is the planning L2 entry point
plan/assets/ and plan/templates/ are planning L3 resources
When the command namespace is:
/hw:plan
/hw:plan:*
/hw:plan:review
/hw:plan:extend
load plan/PLAN-SKILL.md before executing the command-specific behavior.
First Actions
- Read
~/.hypo-workflow/config.yaml if present.
- Read
.pipeline/config.yaml.
- Validate the project config against
config.schema.yaml.
- When shell access is available, prefer
scripts/validate-config.sh for a quick structural pre-check before deeper reasoning.
- Resolve effective config as project > global > defaults. Never hardcode prompts, reports, state, or legacy step-log paths if config overrides them.
- If
execution is still missing after resolution, assume:
mode=self
subagent_tool=auto
steps.preset=tdd
- If
platform is still missing after resolution, assume auto.
- Normalize step overrides:
- accept top-level
step_overrides
- also accept legacy
execution.step_overrides
- if both exist, top-level wins
- Read
.pipeline/state.yaml if it exists. If not, initialize from assets/state-init.yaml and then fill in the prompt-specific fields.
- Read
.pipeline/log.yaml when lifecycle history, milestone status, fixes, audits, release records, or debug context matters.
- Read
.pipeline/rules.yaml and .pipeline/rules/custom/ when rule severity, lifecycle gates, or always-on behavior constraints matter. Missing rules config is compatible and behaves as extends: recommended.
Runtime Resources
Use these bundled files when relevant:
Supported Commands
Handle these commands directly:
/hw:start, 开始执行, start pipeline
Start the pipeline. Resume unfinished state if present unless --clean is given. With --from <prompt>, initialize the current prompt directly to the matched prompt file or prompt stem.
/hw:resume, 继续, continue, 下一步, 执行下一步
Resume from current.prompt_file and current.step. Treat a user-facing interrupted session as persisted unfinished work, usually pipeline.status=running|stopped.
/hw:status, pipeline status, 状态
Read config plus state and print a concise status summary without mutating work. Use compact state/progress when available unless --full is present. When shell access is available, prefer scripts/state-summary.sh.
/hw:skip
Skip the current prompt, persist a prompt-level skip reason, append a prompt skip log event, and advance to the next prompt without incrementing pipeline.prompts_completed.
跳过当前步骤, skip step
Mark the current step as skipped, apply cascade logic when needed, persist state, append log events, and move to the next runnable step.
/hw:stop
Gracefully stop without aborting the pipeline. Persist state, optionally write an intermediate report, and set pipeline.status=stopped. With --no-report, skip the intermediate report.
/hw:report
Load compact report summaries when available. With --view <M>, load the specified Milestone report in full. Otherwise summarize the latest scores, warnings, and decision.
/hw:chat
Load skills/chat/SKILL.md. Enter lightweight append conversation mode, reload state.yaml + cycle.yaml + PROGRESS.md + recent report, and write chat entries instead of Milestone reports.
/hw:help
Show grouped command help. Use --quick for a compact cheat sheet or /hw:help <cmd> for detailed usage, arguments, and examples sourced from this file.
/hw:reset
Reset runtime state only, or use --full / --hard for broader cleanup. Always list the affected files before deletion. --hard requires an explicit YES confirmation.
/hw:log
Read .pipeline/log.compact.yaml when available, otherwise .pipeline/log.yaml; show the latest 10 entries by default, and support --all, --type <type>, --since <milestone>, and --full filters. If the file is missing, say 暂无日志,执行 Pipeline 后自动生成.
/hw:setup
Configure the plugin itself: create or update ~/.hypo-workflow/config.yaml, detect environment, choose plan mode, and choose execution/subagent mode.
/hw:check
Run health checks for config, workspace completeness, state consistency, prompts, Notion connectivity, and architecture. Without .pipeline/, respond with 请先运行 /hw:init.
/hw:init
Detect whether the repo is empty, already has source code, or already has a pipeline, then create or refresh .pipeline/ plus the architecture baseline. Support --rescan, --folder, --single, --import-history, and --import-history --interactive.
/hw:release
Run the seven-step release flow. Support --dry-run, --skip-tests, and explicit --patch / --minor / --major version overrides.
/hw:audit
Audit the whole project or a narrower scope, grade findings as Critical / Warning / Info, write .pipeline/audits/audit-NNN.md, and log the result.
/hw:debug
Investigate a concrete symptom, generate ranked root-cause hypotheses, validate them, and optionally apply --auto-fix only after verification passes.
/hw:plan, /hw:plan:discover, /hw:plan:decompose, /hw:plan:generate, /hw:plan:confirm, /hw:plan:extend, /hw:plan:review
Load plan/PLAN-SKILL.md and route execution to the corresponding Plan Mode phase.
/hw:cycle
Load skills/cycle/SKILL.md. Manage explicit Cycles, archives, deferred items, and project summaries. Old projects without .pipeline/cycle.yaml remain compatible as implicit C1.
/hw:sync
Load skills/sync/SKILL.md. Run light, standard, or deep project sync without executing pipeline milestones. SessionStart may only run light external-change detection.
/hw:patch
Load skills/patch/SKILL.md. Manage persistent lightweight patches under .pipeline/patches/. Support /hw:patch fix P001 [P...] for the lightweight six-step fix lane.
/hw:pr
Load skills/pr/SKILL.md. Handle existing GitHub PRs or GitLab MRs through a local .pipeline/pr/ Change Request archive. Support inspect, review, fix, merge, and close; remote writes require explicit confirmation.
/hw:explain
Load skills/explain/SKILL.md. Answer natural-language questions using cited local evidence. Stay read-only and report unknown or needs_context when evidence is missing.
/hw:compact
Load skills/compact/SKILL.md. Generate .compact context views for PROGRESS, state, log, reports, and closed patches without mutating source files.
/hw:knowledge
Load skills/knowledge/SKILL.md. Inspect .pipeline/knowledge/ records, generated category indexes, compact summaries, and redacted secret references. Default to compact and index context; load raw records only for view or narrow search.
/hw:guide
Load skills/guide/SKILL.md. Sense project state, ask what the user wants, recommend a short command flow, and execute the first command only after confirmation.
/hw:showcase
Load skills/showcase/SKILL.md. Generate project introduction documents, technical docs, slides, and an optional GPT Image poster under .pipeline/showcase/.
/hw:rules
Load skills/rules/SKILL.md. Manage rule severities, built-in presets, custom Markdown rules, lifecycle hook binding, and shareable rule packs.
/hw:review
Emit the legacy migration warning and redirect the user to /hw:plan:review. Keep this alias only for compatibility.
中止, abort
Mark the current prompt and pipeline as aborted, persist state, append a prompt-level log event, and stop.
If a command starts with /hw: and is not listed above, return:
Unknown command: /hw:xxx. Available: /hw:start, /hw:resume, /hw:status, /hw:skip, /hw:stop, /hw:report, /hw:chat, /hw:plan, /hw:plan:discover, /hw:plan:decompose, /hw:plan:generate, /hw:plan:confirm, /hw:plan:extend, /hw:plan:review, /hw:cycle, /hw:accept, /hw:reject, /hw:explore, /hw:sync, /hw:patch, /hw:pr, /hw:explain, /hw:compact, /hw:knowledge, /hw:guide, /hw:showcase, /hw:rules, /hw:init, /hw:check, /hw:audit, /hw:release, /hw:debug, /hw:help, /hw:reset, /hw:log, /hw:setup
Slash commands are exact and take precedence over fuzzy natural-language matching. Detailed parsing and option semantics live in references/commands-spec.md.
If the user command is ambiguous, prefer a safe resume and say which prompt and step you are about to run.
Config Model
Configuration has two layers:
- global config:
~/.hypo-workflow/config.yaml, created by /hypo-workflow:setup
- project config:
.pipeline/config.yaml, created by /hypo-workflow:init or /hypo-workflow:plan-generate
Resolve effective values in this order:
- project config
- global config
- built-in defaults
Key fallbacks:
execution.mode falls back to global execution.default_mode, then self
execution.subagent_tool falls back to global subagent.provider, then auto
plan.mode falls back to global plan.default_mode, then interactive
plan.interaction_depth falls back to global plan.interaction_depth, then medium
dashboard.enabled falls back to global dashboard.enabled, then false
dashboard.port falls back to global dashboard.port, then 7700
output.language falls back to global output.language, then en
output.timezone falls back to global output.timezone, then UTC
watchdog.enabled falls back to global watchdog.enabled, then false
history_import.split_method falls back to global history_import.split_method, then auto
compact.auto falls back to global compact.auto, then true
showcase.language falls back to global showcase.language, then auto
rules.extends falls back to recommended
Read references/config-spec.md when resolving config precedence or field mapping.
Expected top-level config groups:
pipeline
execution
evaluation
plan optional
output optional
watchdog optional
history_import optional
compact optional
knowledge optional
showcase optional
rules optional
platform optional
step_overrides optional
hooks optional
Key defaults:
pipeline.prompts_dir=.pipeline/prompts
pipeline.reports_dir=.pipeline/reports
pipeline.state_file=.pipeline/state.yaml
pipeline.log_file=.pipeline/log.md
- lifecycle log defaults to
.pipeline/log.yaml
plan.mode=interactive
plan.interaction_depth=medium
plan.interactive.min_rounds=3
plan.interactive.require_explicit_confirm=true
output.language=en
output.timezone=UTC
watchdog.enabled=false
watchdog.interval=300
watchdog.heartbeat_timeout=300
watchdog.max_retries=5
watchdog.max_consecutive_milestones=10
watchdog.notify=true
history_import.split_method=auto
history_import.time_gap_threshold=24h
history_import.max_milestones=20
history_import.keyword_patterns=['feat\\(M(\\d+)\\):','M(\\d+)-','milestone-(\\d+)']
compact.auto=true
compact.progress_recent=15
compact.state_history_full=1
compact.log_recent=20
compact.reports_summary_lines=3
knowledge.enabled=true
knowledge.loading.session_start=true
knowledge.loading.compact=true
knowledge.loading.records=false
knowledge.redaction.secret_keys=['api_key','token','secret','password','authorization','access_token','refresh_token','client_secret']
showcase.language=auto
showcase.poster.api_key_env=OPENAI_API_KEY
showcase.poster.size=1024x1536
showcase.poster.quality=high
showcase.poster.style=auto
rules.extends=recommended
dashboard.enabled=false
dashboard.port=7700
dashboard.auto_start=false
dashboard.shutdown_delay=30
execution.mode=self
execution.subagent_tool=auto
execution.steps.preset=tdd
platform=auto
The main skill only needs the normalized values. It should not care whether the user wrote overrides in the legacy or current location.
pipeline.log_file remains the legacy step-trace target for V0-V5 compatibility. V6 also uses .pipeline/log.yaml as the lifecycle ledger for milestones, fixes, audits, debug sessions, releases, and plan reviews.
Prompt Discovery
For source: local:
- Read the configured prompts directory.
- Collect
*.md files.
- Sort them by filename ascending.
- Treat each file as one pipeline prompt.
For source: notion:
- Read the
notion config block.
- Resolve the token from
NOTION_TOKEN or notion.token_file.
- Use
adapters/source/notion.md as the source contract.
- If helper execution is needed, prefer
python3 scripts/notion_api.py fetch-prompts ....
- Convert Notion prompts into the same internal prompt structure used for local files.
Prompt files should usually contain:
If headings differ slightly but meaning is clear, infer by meaning. If critical content is missing, block the prompt instead of guessing.
Architecture Files
/hw:init establishes the architecture baseline. Read it before /hw:plan, /hw:plan:review, /hw:audit, and /hw:debug; update it through /hw:init, /hw:init --rescan, and /hw:plan:review. Layout rules stay in references/init-spec.md.
Plan Modes
Planning now supports two modes through plan.mode:
interactive
- default mode
- Discover asks targeted questions in rounds
- Confirm waits for explicit user approval
interaction_depth controls the minimum P1 question rounds: low=2, medium=3, high=5
- P1 may enter P2 only after the minimum rounds and an explicit user signal such as「够了」「开始吧」「可以了」
- P2 must show the milestone split and wait for confirmation before P3
- hooks should allow turn end during planning checkpoints
auto
- Claude completes P1-P4 without pausing for user feedback unless blocked by missing critical information
- Confirm becomes a summary checkpoint instead of a hard gate
- hooks should block premature turn end so planning continues automatically
/hw:plan --context audit,patches,deferred,debug injects existing evidence into P1 Discover. Context sharpens the interview; it does not skip Discover.
Dashboard
The dashboard is an optional WebUI for .pipeline/ state, config, progress, and reports.
- treat it as a background service that must not block normal agent execution
- keep its configuration under the
dashboard config block and plugin-level setup defaults
- resolve the preferred port as project
dashboard.port > global dashboard.port > 7700
Cycles And Patches
V8 adds two persistent lifecycle surfaces:
- Cycles: explicit delivery containers under
.pipeline/cycle.yaml, archived to .pipeline/archives/
- Patches: lightweight side-track items under
.pipeline/patches/
Cycle rules:
- old projects without
.pipeline/cycle.yaml keep their previous behavior and are treated as implicit C1 only for display
/hw:init must not create .pipeline/cycle.yaml
/hw:cycle new creates the first explicit Cycle, resets Cycle-local state/prompts/reports, and preserves architecture, config, lifecycle log, archives, and patches
/hw:cycle close archives Cycle-local artifacts, writes deferred items, and updates project-root PROJECT-SUMMARY.md
Patch rules:
- Patch numbering is global,
P001, P002, and so on
- Patches are never archived with a Cycle
/hw:plan --context patches can inject open Patches into P1 Discover
Rules
V8.4 adds Rules as an independent behavior dimension.
Rule sources:
- distributed built-ins:
rules/builtin/*.yaml
- distributed presets:
rules/presets/recommended.yaml, strict.yaml, minimal.yaml
- project config:
.pipeline/rules.yaml
- project custom rules:
.pipeline/rules/custom/*.md
- imported packs:
.pipeline/rules/packs/<pack-name>/
Severity model:
off: disabled
warn: emit warning and continue
error: hard gate; stop execution until fixed, disabled, or downgraded
Lifecycle hook points:
on-session-start
pre-milestone
post-milestone
pre-step
post-step
pre-commit
on-fail
on-evaluate
always
Loading priority:
- built-in default severity
extends preset
.pipeline/rules/custom/
.pipeline/rules.yaml rules: overrides
- command-line
--rule name=severity overrides when supported
Missing .pipeline/rules.yaml is compatible with old projects and behaves like extends: recommended.
SessionStart loads active always rules through scripts/rules-summary.sh and injects them into context. Other hook points are enforced by the command-specific Skill behavior and should use references/rules-spec.md when exact rule semantics matter.
⚠️ Patch Fix 执行约束
❌ 绝对禁止:
- 启动 brainstorming 或 Plan Discover
- 走完整 TDD 流水线(write_tests → run_red → ...)
- 写入 state.yaml(Patch 不是 Milestone)
- 生成 report.md
- 单个 Patch 改动超过 5 个文件时不提醒用户
- 顺手重构不相关代码
✅ 必须做到:
- 读取 Patch 描述后直接定位和修复
- 跑现有测试验证不破坏其他功能
- 单次 commit,message 格式:fix(P): <描述>
- 自动关闭 Patch 并更新文件
- 超出范围时停下来建议升级为 Milestone
Step Presets
Resolve the active step sequence from config:
tdd
write_tests -> review_tests -> run_tests_red -> implement -> run_tests_green -> review_code
implement-only
implement -> run_tests -> review_code
custom
Use execution.steps.sequence exactly as configured.
Apply normalized step overrides after preset expansion:
- skip steps whose override sets
enabled: false
- honor
strict
- honor
executor or reviewer
- honor
subagent_tool or subagent
📎 详细步骤规范见 references/tdd-spec.md
Hook 集成(可选)
Resolve platform in this order:
- config
platform
- global
agent.platform
- runtime auto-detection
Platform guidance:
claude
Prefer Claude-specific delegation and hook metadata.
codex
Prefer Codex-compatible delegation and treat hook support as minimal.
Codex should strongly prefer concrete Subagent delegation for substantial work when available.
auto
Infer from the environment.
If platform=auto, detect the environment using repository markers:
.claude/ directory or CLAUDE.md -> Claude Code
.codex/ directory or AGENTS.md -> Codex
If config.yaml sets hooks.enabled=true, treat Hook integration as active when the matching hook files are installed.
Claude Code(完整 Hook 支持)
If the platform resolves to Claude Code and hooks are installed:
- Stop Hook 已激活
- the hook runs before the agent stops
- it checks
state.yaml, log.md, current step state, and report generation
- it may return
decision:block
- the returned
reason becomes the next concrete instruction for the agent
- this acts as a passive completion safety net
- SessionStart Hook 已激活
- the hook injects pipeline state through
additionalContext
- startup, resume, and compact all get fresh pipeline status
- compact reinjection reduces the risk of losing run state after context compression
- InstructionsLoaded Hook(可选)
- purely observational
- useful for logging when
SKILL.md or related instructions reload
When Claude hooks are active, the main skill can simplify some self-check messaging, but it must still preserve the full state machine on its own.
Codex(降级模式)
If the platform resolves to Codex:
- there is no Stop Hook
- there is no SessionStart context injection
- recovery still depends on the agent reading
state.yaml directly
notify is optional and only provides turn-complete observability
AGENTS.md should carry the discipline that hooks cannot enforce
This means Codex keeps the V1 behavior: the skill itself is responsible for stop safety, recovery, and report discipline.
Codex Subagents are Codex/GPT runtime workers. Hypo-Workflow must not require DeepSeek, Mimo, Claude, or other external model routing for Codex delegation. For substantial or non-trivial work, Codex should strongly prefer concrete Subagent delegation when available:
- use a test/review Subagent for test design, failure evidence, final diff review, or assumption challenge
- use an implementation Subagent for scoped edits
- use docs-specific assistance for README, guide, or adapter documentation work
- keep the main agent responsible for integration, state/log/report updates, and final judgment
- if no Subagent is used for substantial work, record a concise reason in the report, Patch file, or lifecycle log
Trivial one-file or pure inspection tasks may stay local without escalation.
Hook 日志
Hook events should be written through scripts/log-append.sh when possible.
Preferred format:
## {timestamp} - hook:{hook_name}
- result: pass | block | warning
- message: ...
Hook sensing rules:
- If a
hooks/ directory exists in the project root, note that hook data may be available.
- If config contains
hooks.enabled=true, prefer the installed hooks but do not rely on them for correctness in non-Claude environments.
- Hook facts may enrich notes, logging, or subagent context.
- Hook facts must never replace the core state machine.
Use platform-specific details only after reading the matching reference:
📎 Claude 细节见 references/platform-claude.md
📎 Codex 细节见 references/platform-codex.md
State Core
Persist state to the configured state file after every meaningful transition.
Core shape:
pipeline:
name: Hypo-TODO
status: idle | running | blocked | aborted | stopped | completed
prompts_total: 0
prompts_completed: 0
started: null
finished: null
last_heartbeat: null
current:
phase: idle | plan_discover | plan_decompose | plan_generate | plan_confirm | executing | lifecycle_init | lifecycle_check | lifecycle_audit | lifecycle_release | lifecycle_debug | lifecycle_cycle | lifecycle_patch | completed
prompt_index: 0
prompt_file: 00-scaffold.md
prompt_name: scaffold
step: write_tests
step_index: 0
milestones:
- name: 00-scaffold
status: done | in_progress | deferred | failed | skipped
deferred_reason: null
prompt_state:
started_at: null
updated_at: null
finished_at: null
result: running | pass | blocked | aborted | stopped | skipped
diff_score: null
code_quality: null
steps:
- name: write_tests
status: pending | running | done | skipped | blocked
executor: self | subagent
subagent_tool: codex | claude | auto | null
subagent_result: null
reason: null
started_at: null
finished_at: null
duration_seconds: null
notes: ""
history:
completed_prompts: []
Core write rules:
current.step must always point at the next runnable or currently running step.
current.step_index must match the position inside prompt_state.steps.
- skipped steps must record both
status=skipped and a machine-readable reason.
- delegated steps must record the actual
executor, actual subagent_tool, and parsed subagent_result when available.
last_heartbeat must be updated on every persisted execution transition during /hw:start and /hw:resume.
评估完成后写入 state.yaml 的 evaluation 块:
evaluation:
diff_score: 1-5
code_quality: 1-5
test_coverage: 1-5 | null
complexity: 1-5
architecture_drift: 1-5
overall: 1-5
adaptive_threshold: 2-5
warnings:
- "..."
该块应存在于当前 prompt 的运行态,并在 prompt 完成后复制到 history[].evaluation。
📎 完整字段、时机和版本演化见 references/state-contract.md
Auto Resume Watchdog
The watchdog is disabled by default. When watchdog.enabled=true, /hw:start registers scripts/watchdog.sh through cron and /hw:resume honors the same lock and heartbeat contract.
Runtime contract:
- update
last_heartbeat with an ISO-8601 timestamp every time execution state is persisted
- create
.pipeline/.lock before active execution
- remove
.pipeline/.lock when the run stops, blocks, aborts, or completes
- watchdog skips when lock exists
- watchdog triggers
/hw:resume only when current.phase=executing and heartbeat age exceeds watchdog.heartbeat_timeout
- pipeline completion,
/hw:stop, and abort unregister the watchdog cron entry
Detailed detection and backoff rules live in skills/watchdog/SKILL.md and scripts/watchdog.sh.
Unified Logging
V6 keeps two log layers: .pipeline/log.yaml for lifecycle history and pipeline.log_file for the backward-compatible step trace, usually .pipeline/log.md. Read references/log-spec.md before mutating log.yaml, write entries for milestones and fix/audit/debug/plan-review/release reports, and create .pipeline/fixes/, .pipeline/audits/, or .pipeline/debug/ only when those artifacts exist.
Logging Core
Append Markdown to the configured legacy step log.
Record only step start, step finish, prompt start, prompt finish, prompt blocked, prompt skipped, and prompt stopped.
Do not record pipeline-wide lifecycle events such as "pipeline initialized".
Preferred shape:
## 2026-04-22T16:01:00+08:00 - 00-scaffold - write_tests - finish
- status: done
- executor: self
- notes: wrote 8 tests across 2 files
When shell access is available, prefer scripts/log-append.sh for simple standardized writes.
Lifecycle events such as milestone completion, release, audit, debug, and plan review should be summarized in .pipeline/log.yaml instead of cluttering the step trace.
Progress Summary
Maintain .pipeline/PROGRESS.md as the human-readable execution summary.
- update it after every milestone start, step completion, milestone completion, and deferred decision
- keep it consistent with
state.yaml and log.yaml
- use it to summarize current status, milestone table, timeline table, patch table, and deferred items for humans
- keep it as a board-style summary, not a loose append-only event log
- write all prose in
output.language
- format times in
output.timezone as same-day HH:MM or cross-day DD日 HH:MM for zh-CN
Detailed format rules live in references/progress-spec.md.
Context Compact
V8.2 adds derived compact views for large runtime files. Generate them with /hw:compact or automatically when compact.auto=true.
Compact files:
.pipeline/PROGRESS.compact.md
.pipeline/state.compact.yaml
.pipeline/log.compact.yaml
.pipeline/reports.compact.md
.pipeline/patches.compact.md
Rules:
- compact files are read-only context views and must never replace source files for mutation
- SessionStart loads compact files first, then falls back to full source files when compact views are absent
- current prompt and current report are always loaded in full
- open Patch files are loaded in full; closed Patch details are represented through
patches.compact.md
/hw:status --full, /hw:log --full, and /hw:report --view <M> bypass compact views for the requested data
Main State Machine
Use this loop for /hw:start, /hw:resume, start pipeline, continue, 下一步, and auto-continue decisions:
- Read config and normalize runtime values.
- Discover prompt files.
- Initialize state if missing.
- If pipeline is already
completed, report completion and stop unless the user explicitly asks to restart or uses /hw:start --clean.
- If pipeline is
aborted or stopped, resume only on explicit continue/start/resume.
- Load the current prompt.
- Find the next step whose status is not
done or skipped.
- If this is a fresh prompt entry, append one prompt-level
prompt_start log event.
- Create
.pipeline/.lock, mark the selected step as running, record started_at, set current.phase=executing, update last_heartbeat, record the resolved executor, and append a step-start log event.
- Execute the step according to the preset, overrides, skip cascade state, and delegation rules.
- The main agent coordinates the step, uses serial Subagent tasks for concrete work when appropriate, and validates the result before continuing. In Codex, substantial work should prefer Codex Subagents when available.
- Record notes, timing, actual executor, result,
last_heartbeat, and PROGRESS.md updates.
- If the step blocks, evaluate whether to
retry, mark the milestone deferred, or stop.
- Persist state and append a step-finish or block event.
- When all enabled steps finish, generate the prompt report, compute evaluation, write final prompt fields, append a prompt-finish log event, and persist state.
- If the prompt passed and architecture tracking is active, run Plan Review before advancing.
- After a milestone-level result is final, append one lifecycle entry to
.pipeline/log.yaml and refresh .pipeline/PROGRESS.md.
- After Plan Review, add the prompt to history and advance state to the next prompt immediately.
- If
auto_continue=false, stop after the state advance and wait for the user to say 继续.
- If there is no next prompt, set
current.phase=completed, mark the pipeline completed, persist state, remove .pipeline/.lock, unregister the watchdog cron entry, and stop.
Skip Cascade
General skip rules:
- keep the current prompt recoverable
- mark every skipped step explicitly
- record a reason
- append skip events to the log
Special cascade from write_tests in tdd:
- mark
write_tests as skipped with reason=user_skipped
- mark
review_tests as skipped with reason=dependency_skipped
- mark
run_tests_red as skipped with reason=dependency_skipped
- continue from
implement
- keep
run_tests_green runnable
- downgrade
run_tests_green to inline validation
- set
run_tests_green.notes to fallback=inline_validation, reason=tests_skipped
- log the downgrade before
run_tests_green starts
Inline validation means:
- check imports
- check syntax
- record
inline_validation in state and log
implement-only and custom flows may also use inline validation when no tests exist for the current prompt.
Subagent Entry Point
Delegation is allowed only when:
execution.mode=subagent
- the normalized step override resolves
executor=subagent or reviewer=subagent
When those conditions are met, Codex should strongly prefer concrete Subagent delegation for substantial work. Testing/review and implementation are separate responsibilities: an implementation Subagent must not be the sole validator of its own changes, and a test/review Subagent should inspect tests, failure evidence, final diffs, or assumptions when practical. Documentation and README work should prefer docs-specific assistance when available. If substantial work stays local, record a non-delegation rationale.
Delegation flow:
- choose the correct subagent template
- assemble prompt context from the active prompt, changed files, and relevant tests
- resolve the actual tool from step override, project execution default, global subagent provider, and platform
- try the delegated execution
- parse JSON output
- merge the structured result back into state
Tool selection:
auto
choose the best supported backend for the current platform; in Codex, prefer codex exec and do not choose DeepSeek, Mimo, Claude, or other external model routing as a Codex Subagent backend
claude
prefer Claude subagent definitions or claude -p; this is a Claude/cross-tool path, not Codex Subagent external model selection
codex
prefer codex exec
Fallback rules:
- if the tool is unavailable, execution fails, or JSON cannot be parsed, rerun the same step locally
- set
subagent_fallback=true in the log note
- set a concise fallback
reason
- mark the actual executor as
self
- never block the pipeline because delegation failed by itself
The main skill should only own the routing and fallback. Template content and detailed note formats belong in the reference layer.
📎 Subagent 细节见 references/subagent-spec.md
Execute Architecture
Execution is now modeled as main-agent orchestration plus serial Subagent work:
- The main agent reads the active prompt and decomposes the next task.
- The main agent delegates concrete write/test/run/report steps serially when useful.
- In Codex, substantial work should attempt Codex Subagents when available, while trivial one-file work may stay local.
- The main agent validates Subagent output, updates state, logs, and progress summaries.
- The main agent decides whether to continue, retry, defer, or stop.
- Hooks and watchdogs should keep execution moving until all milestones finish or the main agent explicitly chooses the
stop outcome.
评估决策(V4 多维度)
review_code 完成后,对本轮 Prompt 执行多维度评分。
📎 各维度评分标准、权重公式、架构漂移检测细则见 references/evaluation-spec.md
评分维度:
diff_score
code_quality
test_coverage(仅 TDD)
complexity
architecture_drift
overall
阻塞决策:
STOP(任一触发):
diff_score > threshold
architecture_drift >= 4
overall > threshold + 1
WARN(记录不阻塞):
complexity >= 4
test_coverage <= 2
threshold = adaptive_threshold 或 max_diff_score
自适应阈值在 evaluation.adaptive_threshold=true 时启用:
- 连续 3 个
diff_score <= 2 -> 收紧
- 出现
STOP -> 放宽
- 其他情况 -> 保持
📎 自适应阈值详细规则见 references/evaluation-spec.md
向后兼容要求:
- 当
adaptive_threshold=false 时,保持 V3 的 diff_score > max_diff_score 主判定行为
- 多维评分仍可写入 state 和报告,但不应破坏旧配置的默认流转语义
报告规则:
Plan Review
When the pipeline was generated through Plan Mode and .pipeline/architecture.md exists, run Plan Review after prompt evaluation and before prompt advance, compare the completed milestone against the current baseline, record ADDED, CHANGED, REASON, and IMPACT, and inspect whether downstream prompts should be revised. Detailed behavior belongs in references/plan-review-spec.md.
Failure Handling
When a milestone or step fails, Claude must explicitly choose one path:
retry
- use a revised strategy and optionally ask a subagent to analyze the failure first
deferred
- mark the milestone as deferred when downstream work can continue safely
- store
deferred_reason and surface it in PROGRESS.md
stop
- stop and wait for user intervention
- leave a clear blocking reason in state and logs
Restart And Abort
If the user explicitly asks to restart:
- keep old reports and logs unless deletion is explicitly requested
- reinitialize state from
assets/state-init.yaml
- set the first prompt and its first runnable step
- make it clear that the run is a restart, not a resume
If the user asks to stop gracefully or invokes /hw:stop:
- persist the current prompt and pipeline state
- set
pipeline.status=stopped
- if
--no-report is not present, write an intermediate report for the current prompt
- append one prompt-level stop event
- stop without discarding context or marking the prompt aborted
If the user asks to abort:
- mark prompt and pipeline as aborted
- persist state
- append one prompt-level log event
- stop without discarding context
Failure Handling
Stop and explain the reason when:
- config is invalid
- prompt files are missing
- preset expansion fails
- custom sequence is missing
- a detailed reference file needed for the current branch is missing
- evaluation cannot be computed from available evidence
- the prompt is blocked by
diff_score
Prefer explicit blocking over silent guessing.
Platform Packaging
This skill is packaged for Claude plugin installation through:
That manifest should only point to this SKILL.md. Hooks, commands, and agent definitions can grow in later versions without changing the core state machine here.
Output Adapters
For output: local:
- persist reports to the configured reports directory
For output: notion:
- read the
notion config block
- resolve the token from
NOTION_TOKEN or notion.token_file
- use
adapters/output/notion.md as the write contract
- prefer
python3 scripts/notion_api.py upsert-report ... when helper execution is needed
- if the Notion write fails, keep local report generation intact and report the adapter error explicitly
Deprecated Layout
The old templates/ directory is retained for compatibility but is now considered deprecated.
- reports now live in
assets/report-template.md
- TDD policy now lives in
references/tdd-spec.md
- evaluation policy now lives in
references/evaluation-spec.md
- subagent prompt templates remain in
templates/subagent/
Read templates/DEPRECATED.md before adding new material to the old template tree.
Boundaries
V4 extends evaluation behavior but still does not add new remote execution capabilities beyond the existing runtime model.
Do not claim support for:
- remote prompt execution beyond the existing supported adapters
- non-local reports in this packaged layout unless the runtime explicitly supports them
- concurrent fan-out delegation for one step
- replacing the state machine with hook-only orchestration
- deleting the deprecated template tree automatically