name	gormes-tdd-slice
description	Implement one Gormes progress.json row using tracer-bullet test-driven development. Use when building or fixing a Gormes feature with tests, when a row has test_commands, when the user asks for TDD/red-green-refactor, or when implementing Goncho/Hermes parity behavior one vertical slice at a time.

Gormes TDD Slice

Repository Branch Rule

For Gormes work, stay on the existing development branch. Do not create or use feature branches, short-lived branches, or git worktrees. If the checkout is not on development, stop before editing and switch safely or report the blocker.

Mission

Ship one narrow Gormes behavior with a red-green-refactor loop. Tests must verify public behavior, not implementation details.

For Hermes parity bugs, the RED test must encode the upstream-visible behavior or artifact, including negative visual/output details. "It works" is not enough when the bug is duplicate output, stale labels, boxed chrome, hidden tool-call noise, or platform-visible formatting.

Hermes Agent is the Python upstream/father implementation for Gormes. Prefer the in-repo checkout at ./hermes-agent; fall back to ../hermes-agent only when absent. Resolve it as $HERMES_SRC before writing parity tests.

Workflow

1. Select One Behavior

Use the selected progress.json row or choose one builder-ready row. State:

public interface under test;
feature-map target or upstream concept;
behavior to prove;
row-local test_commands;
allowed write scope.

If the row is too broad, split/refine it before coding.

For UI/TUI/channel parity, identify the active upstream implementation before writing the test. Current full-screen TUI UX comes from $HERMES_SRC/ui-tui, while older cli.py prompt_toolkit code is only a legacy/detail donor when current Ink does not cover the behavior.

2. RED

Write one failing test for one observable behavior. Prefer:

command output or exit behavior for CLI;
request/response behavior for API/gateway;
public Go package behavior for provider/tools/memory/session;
compatibility request/response for Goncho/Honcho;
hermetic fixtures over live network/provider calls.

The failing test should prove the feature-map behavior through a public contract. Avoid private helper tests unless the progress row explicitly makes that helper the exported contract for the slice.

Run the focused test and confirm it fails for the right reason.

For visual/output regressions, prefer explicit artifact tests:

absence of stale product labels such as Hermes where Gormes owns the label;
absence of duplicate assistant messages or history+draft double-rendering;
absence of debug/status noise in idle chrome;
absence of old boxes/rules/prompt rows when current Hermes no longer renders them;
presence and order of user-visible rows, not private helper state.

For Telegram or other channel bugs, turn the user's transcript into a fixture. Assert the outbound operation sequence: status sends, edits/deletes, tool-call visibility, final-message count, and final text. A correct final answer is not green if an extra hourglass, leaked tool progress, or duplicate final reply was also sent.

For gateway tool-progress bugs, first decide which Hermes surface defines the artifact. Emoji/snake_case lines such as 📚 skill_view: "plan" are gateway progress from $HERMES_SRC/gateway/run.py plus agent.display.build_tool_preview; current Ink TUI shelves render Title Case tool calls such as Read File("x"). A regression test must encode the right surface, including new vs all, preview truncation, (×N) collapse, todo merge=true wording, and unknown-tool degradation.

For tool-loop regressions, test the boundary that leaked to the user. Kernel budget bugs belong in internal/kernel and should prove Hermes' 90-turn default plus toolless summary fallback. Channel bugs belong in gateway/channel tests and should prove raw budget errors, tool-call XML, or provider repair diagnostics do not become final user messages.

For dashboard/API security parity, encode both sides of the boundary in the RED test: public read-only allowlist routes (for example /api/status) must remain reachable, while sensitive dashboard /api/ routes must reject missing session tokens and accept the active Hermes header such as X-Hermes-Session-Token plus any legacy Bearer compatibility the upstream source still supports. When a new guard changes existing contract tests, update those fixtures to pass the same token instead of weakening the guard or treating the old unauthenticated expectation as product behavior.

For runtime-lock or installer-adjacent tests, isolate state with GORMES_HOME temp dirs unless the behavior under test is the real persisted home. Never make a test pass by depending on the developer's live workspace-gormes, workspace-mineru, ~/.gormes, or ~/.hermes paths.

3. GREEN

Write the smallest implementation that passes this test. Stay inside write scope. Do not add speculative future behavior.

4. Repeat Vertically

Add the next behavior only after the prior test is green. Do not write a batch of imagined tests first.

For broad builder rows that enumerate several equivalent adapters or providers, a cron-sized TDD pass may ship one source-backed adapter/provider as the vertical slice when the row contract is otherwise too large for one safe run. In that case:

keep the progress row planned unless every acceptance bullet is now satisfied;
update the row note with the exact partial evidence and remaining adapters;
add a search/aggregation-level regression when the slice introduces a new typed degraded condition, not only a provider-local test;
do not mark umbrella or multi-provider rows complete from one provider's green tests.

5. Refactor While Green

Only refactor after tests pass. Prefer deep modules: small interface, substantial hidden implementation, clear locality.

6. Verify

Run row test_commands, focused package tests, go run ./cmd/progress validate, and the gates in references/gates.md.

If the working tree contains unrelated user or parallel-agent changes, do not revert them. Run the focused tests for the selected behavior first, then broaden only as far as the current tree permits. If unrelated failures block a full gate, report them separately with file paths and failure commands.

Final Report

Report red-green cycles, feature-map target, behavior shipped, tests run, and any progress row updates needed.

name	gormes-tdd-slice
description	Implement one Gormes progress.json row using tracer-bullet test-driven development. Use when building or fixing a Gormes feature with tests, when a row has test_commands, when the user asks for TDD/red-green-refactor, or when implementing Goncho/Hermes parity behavior one vertical slice at a time.

Gormes TDD Slice

Repository Branch Rule

Mission

Ship one narrow Gormes behavior with a red-green-refactor loop. Tests must verify public behavior, not implementation details.

Workflow

1. Select One Behavior

Use the selected progress.json row or choose one builder-ready row. State:

public interface under test;
feature-map target or upstream concept;
behavior to prove;
row-local test_commands;
allowed write scope.

If the row is too broad, split/refine it before coding.

2. RED

Write one failing test for one observable behavior. Prefer:

command output or exit behavior for CLI;
request/response behavior for API/gateway;
public Go package behavior for provider/tools/memory/session;
compatibility request/response for Goncho/Honcho;
hermetic fixtures over live network/provider calls.

The failing test should prove the feature-map behavior through a public contract. Avoid private helper tests unless the progress row explicitly makes that helper the exported contract for the slice.

Run the focused test and confirm it fails for the right reason.

For visual/output regressions, prefer explicit artifact tests:

absence of stale product labels such as Hermes where Gormes owns the label;
absence of duplicate assistant messages or history+draft double-rendering;
absence of debug/status noise in idle chrome;
absence of old boxes/rules/prompt rows when current Hermes no longer renders them;
presence and order of user-visible rows, not private helper state.

3. GREEN

Write the smallest implementation that passes this test. Stay inside write scope. Do not add speculative future behavior.

4. Repeat Vertically

Add the next behavior only after the prior test is green. Do not write a batch of imagined tests first.

keep the progress row planned unless every acceptance bullet is now satisfied;
update the row note with the exact partial evidence and remaining adapters;
add a search/aggregation-level regression when the slice introduces a new typed degraded condition, not only a provider-local test;
do not mark umbrella or multi-provider rows complete from one provider's green tests.

5. Refactor While Green

Only refactor after tests pass. Prefer deep modules: small interface, substantial hidden implementation, clear locality.

6. Verify

Run row test_commands, focused package tests, go run ./cmd/progress validate, and the gates in references/gates.md.

Final Report

Report red-green cycles, feature-map target, behavior shipped, tests run, and any progress row updates needed.

gormes-tdd-slice

Gormes TDD Slice

Repository Branch Rule

Mission

Workflow

1. Select One Behavior

2. RED

3. GREEN

4. Repeat Vertically

5. Refactor While Green

6. Verify

Final Report

Gormes TDD Slice

Repository Branch Rule

Mission

Workflow

1. Select One Behavior

2. RED

3. GREEN

4. Repeat Vertically

5. Refactor While Green

6. Verify

Final Report