تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

writing-tests

Name: Writing Tests
Author: zaxbysauce

// Guidelines for writing, organizing, and maintaining tests in the opencode-swarm repository. Covers framework rules (bun:test), mock isolation, CI pipeline structure, file placement, and anti-patterns that break cross-platform CI. Load this skill before writing or modifying any test file.

تشغيل في Manus

$ git log --oneline --stat

stars:٣٤٠

forks:٣٥

updated:٢٩ مايو ٢٠٢٦ في ١٦:٥٨

SKILL.md

readonly

related-skills.json

نفس المستودع

subprocess-safety.md

from "zaxbysauce/opencode-swarm"

Guidelines for safe subprocess calls in opencode-swarm. Load before adding, modifying, or reviewing any file that calls spawn, spawnSync, bunSpawn, or child_process. Covers the six required properties, Windows portability, _internals DI seam pattern, and verification grep.

2026-05-29340

mock-to-internals-migration.md

from "zaxbysauce/opencode-swarm"

Apply when converting test files from mock.module('node:child_process') to _internals DI seam pattern. Guides the complete migration: adding spawnSync/_internals to source files, converting test files, adding proper beforeEach/afterEach save/restore lifecycle, mockReset() cleanup, and temp directory cleanup. Prevents mock.module leaks across Bun's shared test-runner process.

2026-05-29340

plan.md

from "zaxbysauce/opencode-swarm"

Full execution protocol for MODE: PLAN -- plan creation, external plan ingestion, QA gate persistence, task granularity, and traceability checks.

2026-05-29340

brainstorm.md

from "zaxbysauce/opencode-swarm"

Full execution protocol for MODE: BRAINSTORM -- structured discovery dialogue, approach selection, spec drafting, QA gate selection, and transition handling.

2026-05-29340

clarify.md

from "zaxbysauce/opencode-swarm"

Full execution protocol for MODE: CLARIFY -- structured clarification funnel with critic review before surfacing user decisions.

2026-05-29340

clarify-spec.md

from "zaxbysauce/opencode-swarm"

Full execution protocol for MODE: CLARIFY-SPEC -- resolving spec clarification markers and maintaining spec/planning alignment.

2026-05-29340

package.json

"author": "zaxbysauce"

"repository": "zaxbysauce/opencode-swarm"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

محللو ضمان جودة البرمجيات والمختبرونمهن الحاسوب والرياضيات15-1253L4

تشغيل أي مهارة بنقرة واحدة

name	writing-tests
description	Guidelines for writing, organizing, and maintaining tests in the opencode-swarm repository. Covers framework rules (bun:test), mock isolation, CI pipeline structure, file placement, and anti-patterns that break cross-platform CI. Load this skill before writing or modifying any test file.

Writing Tests for opencode-swarm

⚠️ Do NOT use the OpenCode test_runner tool to validate the full repo. It is for targeted agent validation with explicit files: [...] or small targeted scopes. scope: 'all' requires allow_full_suite: true and is intended for opt-in CI mirrors only. Broad scopes can stall or kill OpenCode before the MAX_SAFE_TEST_FILES = 50 (src/tools/test-runner.ts:26) guard fires. For repo validation, use the shell commands in this file — per-file isolation loops match CI behavior. allow_full_suite should be used only when intentional and justified in the PR description. See AGENTS.md invariant 6 for the full contract.

⛔ STOP — Read Before Running Any Tests

test_runner scope safety — one rule, no exceptions:

Scope	Files param	Safe?
`'convention'`	single source file	✅ Safe
`'convention'`	multiple source files	❌ Rejected — guard fires (`scope_exceeded`) before fan-out; use shell loop
`'convention'`	direct test file paths	✅ Safe — exempt from source-file limit
`'graph'`	single file	✅ Safe
`'graph'`	multiple files	❌ Rejected (`scope_exceeded`) — guard fires before import-graph traversal
`'impact'`	multiple files	❌ Rejected (`scope_exceeded`) — same reason
`'all'`	any	❌ Never in agent context

If you need to run tests across multiple source files: use a per-file shell loop, not test_runner.

Truncated output recovery: When bun test output exceeds the bash tool buffer it is saved to a file whose ID (tool_abc123...) cannot be retrieved via retrieve_summary (which only accepts S1, S2 format). Workaround — pipe to a temp file instead:

# PowerShell (Windows)
bun --smol test tests/unit/agents --timeout 60000 | Out-File "$env:TEMP\test_out.txt"; Get-Content "$env:TEMP\test_out.txt" | Select-Object -Last 30

# bash (Linux/macOS)
bun --smol test tests/unit/agents --timeout 60000 2>&1 | tee /tmp/test_out.txt | tail -30

Framework: bun:test Only

All test files MUST import from bun:test:

import { describe, test, expect, beforeEach, afterEach } from 'bun:test';

Bun provides a vitest compatibility layer (vi.mock, vi.fn, vi.spyOn) that works on Linux and macOS. However, vi.mock() has critical isolation bugs in Bun when multiple test directories run in the same process. Prefer bun:test native APIs:

vitest API	bun:test equivalent	Notes
`vi.fn()`	`mock(() => ...)`	Import `mock` from `bun:test`
`vi.spyOn(obj, method)`	`spyOn(obj, method)`	Import `spyOn` from `bun:test`
`vi.mock('module', factory)`	`mock.module('module', factory)`	Import `mock` from `bun:test`
`vi.restoreAllMocks()`	`mock.restore()`	Call in `afterEach`

Mock Isolation Rules

CRITICAL: Module-level mocks leak across test files within the same Bun process.

Bun's --smol mode shares the module cache between test files in the same worker process. A mock.module() call in file A replaces the module globally — file B gets the mock instead of the real module. This caused ~959 failures before per-file isolation was added (#330).

Additional critical limitation (Bun v1.3.11): mock.restore() does NOT reliably restore mock.module mocks. Cross-module mocks can persist across test boundaries even after afterEach(mock.restore()) is called. Three layers of defense are required.

Rules

Spread the real module when mocking. Only override the specific export you need:

import * as realChildProcess from 'node:child_process';
const mockExecFileSync = mock(() => '');
mock.module('node:child_process', () => ({
  ...realChildProcess,          // preserve all other exports
  execFileSync: mockExecFileSync, // override only what you test
}));

This prevents tests from accidentally nullifying exports that other code depends on. This is mandatory for Node built-ins (node:fs, node:fs/promises, node:child_process, etc.) because other code imports the full module — returning a partial mock without spreading real exports breaks unrelated imports.

Use lazy binding in source code. Import the namespace, call methods at invocation time:

// GOOD — mockable via mock.module
import * as child_process from 'node:child_process';
function run() { return child_process.execFileSync('git', ['status']); }

// BAD — binds at module load, mock.module can't intercept
import { execFileSync } from 'node:child_process';

Always add afterEach(mock.restore()) for cross-module mocks. Even though it is unreliable in Bun v1.3.11, it provides best-effort cleanup and reduces the window of cross-file contamination. Without it, the mock persists until the process exits:

import { afterEach, mock } from 'bun:test';

afterEach(() => {
  mock.restore();
});

Exception — Windows EBUSY: Test files that spawn async child processes (e.g. pre-check-batch tests) must NOT call mock.restore() on Windows. Child process handles can hold directory locks, and mock.restore() triggers cleanup that causes EBUSY errors. These files must use describe.skipIf(process.platform === 'win32') or test.skipIf(process.platform === 'win32') for affected tests.

Intentionally skipped on Windows (async child process handles cause EBUSY):

tests/unit/tools/pre-check-batch-sast-preexisting.test.ts
tests/unit/tools/pre-check-batch.adversarial.test.ts
tests/unit/tools/pre-check-batch-cwd.test.ts
tests/unit/tools/pre-check-batch-cwd.adversarial.test.ts
tests/unit/tools/pre-check-batch-contextdir-adversarial.test.ts
tests/unit/tools/pre-check-batch-secretscan-evidence.test.ts
tests/unit/tools/pre-check-batch.test.ts

Never create circular mock imports. This pattern deadlocks Bun:

// BROKEN — imports from the module it's about to mock
import { realFn } from '../../src/module.js';
vi.mock('../../src/module.js', () => ({
  realFn: (...args) => realFn(...args),  // circular!
  otherFn: vi.fn(),
}));

Instead, inline the function logic or extract the real functions into a separate utility module.

Prefer constructor/parameter injection over module mocking. The swarm's hook factories (createScopeGuardHook, createDelegationLedgerHook, etc.) accept injected dependencies — test them by passing mock callbacks, not by replacing modules.
Mock validateDirectory when testing with Windows temp paths. The path-security.ts validator rejects Windows absolute paths (C:\...). If your test uses os.tmpdir() and passes that path to a function that calls validateDirectory, mock it:

mock.module('../../../src/utils/path-security', () => ({
  validateDirectory: () => {},
  validateSwarmPath: (p: string) => p,
}));

Diagnosing Test Isolation Failures

When test files pass individually but fail when run together, follow this protocol:

Isolate: Run the failing file alone: bun test <file>.test.ts --timeout 30000
Pair: Run it WITH its suspected polluting neighbor: bun test <fileA>.test.ts <fileB>.test.ts
Classify:
- Both pass alone → fail together → mock pollution from neighbor
- Fails alone → test logic bug (not isolation issue)
- Passes alone + passes together but fails in full suite → third-file pollution (use binary search across directory)
For mock pollution, check the neighbor for these patterns:
- vi.mock() or mock.module() inside beforeEach() (not at top level)
- delete require.cache[...] combined with re-import pattern
- These indicate hoist-time closure capture — see below
Specific symptom — closure capture failure: vi.mock() captures closures at hoist time (before beforeEach runs). Reassigning mockFn.mockImplementation(newFn) in the test body does NOT update the hoisted closure — the mock still calls the original function.
- Symptom: expect(mockFn).toHaveBeenCalledTimes(N) fails with an unexpected count
- Symptom: expect(mockFn).not.toHaveBeenCalled() fails because the real function was called
Fix path: Migrate the affected test file to _internals DI seam pattern per the mock-to-internals-migration skill. This eliminates both the vi.mock() call and the closure capture surface area.

Two-Tier Mock Convention

The codebase uses a two-tier strategy for mock isolation, plus a zero-mock testing pattern:

Tier 0: _test_exports Pure Function Testing (Zero Mocks)

When a module contains internal utility functions (formatters, normalizers, transformers) that don't need external dependencies, export them via a _test_exports object for direct unit testing. This avoids mock.module entirely and produces tests that are deterministic, fast, and immune to Bun's cross-file mock leakage:

// In source file (src/tools/formatter.ts)
function formatEntry(entry: SomeType): string {
  // internal implementation — may use optional chaining, defaults, etc.
  return entry.score?.toFixed(2) ?? 'N/A';
}

// Public API (tool handler, command handler, etc.)
export function handleQuery(ctx: Context) {
  const entries = readData(ctx);
  return entries.map(formatEntry);
}

// Export seam for testing — only used by test files
export const _test_exports = { formatEntry };

// In test file (tests/unit/tools/formatter.test.ts)
import { _test_exports } from '../../../src/tools/formatter';

const { formatEntry } = _test_exports;

describe('formatEntry', () => {
  test('handles missing score', () => {
    expect(formatEntry({ score: undefined })).toBe('N/A');
  });
  test('formats numeric score', () => {
    expect(formatEntry({ score: 0.85 })).toBe('0.85');
  });
});

When to use Tier 0 vs Tier 1:

Tier 0 (_test_exports): The function is a pure utility (formatter, normalizer, transformer) that doesn't call external modules. No mocking needed — test it directly.
Tier 1 (_internals): You need to mock a function within the same module to test the caller in isolation. The function has side effects or calls external APIs.
Tier 2 (mock.module): You need to mock a dependency from another module (Node built-ins, other application modules).

Benefits of Tier 0:

Zero mock pollution — no mock.module calls, no mock.restore() needed
Works in batch test runs without per-file isolation
Type-safe (the exported object carries the real TypeScript types)
No filesystem dependencies (no tmpDir, no chdir, no existsSync)
Deterministic on all platforms and CI environments

Tier 1: _internals DI Seams (Within-Module)

For mocking functions within the same module, source files export an _internals object that wraps key functions. Tests can replace individual functions without using mock.module:

// In source file (src/services/my-service.ts)
export const _internals = {
  helperFn: () => { /* real implementation */ }
};

export function mainFn() {
  return _internals.helperFn();
}

// In test file
import { _internals, mainFn } from '../../../src/services/my-service';

test('mainFn uses mocked helper', () => {
  const original = _internals.helperFn;
  _internals.helperFn = mock(() => 'mocked');
  // ... test ...
  _internals.helperFn = original; // restore
});

Benefits:

No process-global mock pollution
Type-safe
Fast (no module re-parsing)
Works in batch test runs without isolation

Tier 2: mock.module (Cross-Module)

When mocking dependencies from other modules (especially Node built-ins), use mock.module with proper cleanup:

import * as realFs from 'node:fs/promises';

mock.module('node:fs/promises', () => ({
  ...realFs,  // MUST spread real exports
  readFile: mock(() => Promise.resolve('mocked')),
}));

afterEach(() => mock.restore());

Critical rules for cross-module mocks:

Always spread real exports for Node built-ins — other code depends on exports you don't mock
Always add afterEach(mock.restore()) — provides best-effort cleanup
Run in per-file isolation — CI runs each file in its own process (for f in *.test.ts; do bun --smol test "$f"; done)

Choosing Between Tiers

Scenario	Pattern	Example
Mocking a function in the same module you're testing	`_internals` seam	`src/state.ts` `_internals.loadSnapshot`
Mocking a Node built-in (fs, child_process, etc.)	`mock.module` + spread real	`mock.module('node:fs/promises', () => ({ ...realFs, readFile: mockFn }))`
Mocking another application module	`mock.module` + cleanup	`mock.module('../../../src/utils/logger', ...)` + `afterEach(mock.restore())`
File-scoped mock (applies to all tests in file)	`mock.module` at top level + `mockReset()` in `beforeEach`	Preflight tests with `mockLoadPlan.mockReset()`

mock.module() Export Completeness

When using mock.module() (or vi.mock()) with Bun's test runner, the mock factory MUST provide stubs for ALL named exports of the target module — not just the ones your test calls. Bun validates the export set at dynamic-import time and throws SyntaxError: Export named 'X' not found if any export is missing.

Why this matters

Transitive imports may reference exports your test never calls directly. For example, if your test mocks config/schema.js and only uses stripKnownSwarmPrefix, but a transitive dependency imports PluginConfigSchema from the same module, the mock MUST include PluginConfigSchema as a stub — even though your test never calls it.

When the source module gains new exports (e.g., a PR adds 50 new Zod schemas to config/schema.ts), ALL existing mock.module() calls targeting that module must be updated — even if the new exports are irrelevant to your test.

How to verify completeness

Before finalizing a test that uses mock.module():

List all runtime exports of the target module (type-only exports are erased at compile time and need no stub):
```
grep -E "^export (const|function|async function|class) " src/path/to/module.ts
```
Note: Do NOT include type or interface exports — Bun erases these at compile time and they need no runtime stub.
Ensure every export name has an entry in your mock.module() factory.

Stubs can be minimal:

Functions: () => null or async () => {}

Zod schemas: use a comprehensive stub that supports common methods:

const zodStub = {
  parse: (v: unknown) => v,
  safeParse: (v: unknown) => ({ success: true as const, data: v }),
  parseAsync: async (v: unknown) => v,
};

Constants: appropriate zero values ('', 0, null, [], {})

Verification pattern

// ✅ CORRECT — all exports provided, test uses only the first one
mock.module('../../../src/config/schema.js', () => ({
  // The one export your test actually uses
  stripKnownSwarmPrefix: mockStripFn,
  // Stubs for transitive import resolution (never called in test)
  PluginConfigSchema: zodStub,
  ScoringConfigSchema: zodStub,
  isKnownCanonicalRole: () => false,
  // ... all other runtime exports as stubs
}));

// ❌ WRONG — missing exports cause SyntaxError at module-load time
mock.module('../../../src/config/schema.js', () => ({
  stripKnownSwarmPrefix: mockStripFn,
  // Missing: PluginConfigSchema, ScoringConfigSchema, etc.
  // → "SyntaxError: Export named 'PluginConfigSchema' not found"
}));

What IS and IS NOT test theater

Adding stubs for ESM resolution is NOT test theater — it's a Bun runtime requirement. The distinction:

Pattern	Test theater?	Why
Adding `PluginConfigSchema: zodStub` so the module loads	No	Required for ESM resolution; stub is never called
Stubbing `validateDirectory` to return `true` then asserting "validation works"	Yes	The stub bypasses the logic you should be testing
Using `zodStub` in assertions: `expect(zodStub.parse(input)).toBe(input)`	Yes	Testing the stub, not the real code
Adding stubs for ALL 50 Zod schemas in config/schema.ts	No	All are required for transitive import resolution

The stubs exist solely to satisfy the module loader. Test assertions must verify behavior through the real-mocked functions (the ones your test actually calls), not through the stubs.

Files Intentionally Using File-Scoped Mocks

Some test files use top-level mock.module that must persist across all tests in the file. These files use mockReset()/mockClear() in beforeEach instead of mock.restore() in afterEach:

src/__tests__/preflight-phase.test.ts — mocks plan/manager and preflight-service

Cross-Platform Test Patterns

Tests run on all three CI platforms (ubuntu, macos, windows). Path and filesystem behavior differs between them. Follow these patterns to prevent platform-specific failures:

Mock keys with filesystem paths

Never hardcode Unix-format paths as mock keys. On Windows, path.resolve('/dir', 'file') produces drive-letter-prefixed paths like D:\dir\file, not /dir/file. A mock that checks for /dir/file will silently never match, causing the test to behave differently on Windows.

Use path.resolve() to construct mock keys the same way the source code does:

// ❌ WRONG — fails on Windows (mock expects '/safe/dir/linked.ts',
//    but path.resolve('/safe/dir', 'linked.ts') = 'D:\safe\dir\linked.ts')
mockRealpathSync.mockImplementation((inputPath: string) => {
  if (inputPath === '/safe/dir') return '/safe/dir';
  if (inputPath === '/safe/dir/linked.ts') return '/outside/linked.ts';
  return inputPath;
});

// ✅ CORRECT — path.resolve produces matching keys on all platforms
const mockDir = path.resolve('/safe/dir');
const linkedResolved = path.resolve(mockDir, 'linked.ts');
const outsideResolved = path.resolve('/outside/linked.ts');

// mockRealpathSync is a mock() function (bun:test) — see mocking patterns above
mockRealpathSync.mockImplementation((inputPath: string) => {
  if (inputPath === mockDir) return mockDir;
  if (inputPath === linkedResolved) return outsideResolved;
  return inputPath;
});

Symlink behavior differences

On Windows, fs.symlinkSync for directories creates junctions by default, which resolve differently than POSIX symlinks. Junction creation may require administrator elevation on older Node.js versions.
fs.realpathSync on a broken symlink throws ENOENT on POSIX but may throw EINVAL on Windows, depending on symlink type.
Use test.skipIf(process.platform === 'win32') for tests that directly manipulate filesystem symlinks, unless the test's purpose is explicitly to verify cross-platform symlink behavior.

Temporary directory patterns

Use os.tmpdir() + path.join() for temp paths. Never hardcode /tmp or C:\.
Wrap mkdtempSync in realpathSync if the result is chdir'd on macOS (temp dirs are often symlinked to /private/var/...).
Clean up temp dirs in afterEach or afterAll using the rm utility with { force: true, recursive: true }.

Line ending normalization

Git on Windows converts LF to CRLF by default. Tests that compare file contents byte-by-byte against expected strings must normalize line endings:

const actual = readFileSync(path, 'utf-8').replace(/\r\n/g, '\n');

CI Pipeline Structure

The CI runs on three platforms (ubuntu, macos, windows). Tests are split into sequential steps within each platform's job.

Step 1: hooks (Linux/macOS only, skipped on Windows) — batch per-group
Step 2: cli — batch
Step 3: commands + config — batch
Step 4: tools — per-file isolation loop
Step 5: services + build + quality + sast + sbom + scripts — per-file isolation loop
Step 6: state + agents + knowledge + evidence + plan + misc — per-file isolation loop

Steps 4-6 use per-file isolation: each .test.ts file runs in its own bun --smol process to prevent mock.module() cache poisoning (#330). Steps 1-3 run files in batch (one process per step) because they have fewer mock conflicts.

When writing a test, know which step your file will run in. In batch steps, do not assume isolation from other files in the same step.

Job timeout: 15 minutes. A single hanging test will kill the entire platform's test run.

File Placement

Convention

Test type	Location	When to use
Unit tests for `src/hooks/*.ts`	`tests/unit/hooks/`	Testing hook factories and hook behavior
Unit tests for `src/tools/*.ts`	`tests/unit/tools/`	Testing tool execute functions
Unit tests for `src/commands/*.ts`	`tests/unit/commands/`	Testing CLI command handlers
Unit tests for `src/config/*.ts`	`tests/unit/config/`	Testing schema validation, config loading
Unit tests for `src/agents/*.ts`	`tests/unit/agents/`	Testing agent prompt generation, factory logic
Colocated tests	`src/*/.test.ts`	Integration-style tests tightly coupled to the source module
Integration tests	`tests/integration/`	Cross-module workflows, plugin initialization
Security tests	`tests/security/`	Adversarial input handling, injection resistance
Smoke tests	`tests/smoke/`	Built package validation

Naming

Base test: <module>.test.ts
Adversarial variant: <module>.adversarial.test.ts

Only create an adversarial variant if it tests distinct attack vectors not covered by the base test. Do not duplicate base test assertions with different inputs — that's redundancy, not security coverage.

Regression tests (review-surfaced bugs)

When fixing a bug surfaced by code review, swarm review, or post-merge audit, always add a regression test with the following shape so the test's purpose survives future cleanup:

describe('<feature> — regression: <one-line description> (F#)', () => {
  it('<exact behavior the bug violated>', () => {
    // Previous code did <bad thing>: e.g. the regex `/^\.\/+/` only stripped
    // a single leading `./`, so `././util.ts` survived as `./util.ts`.
    expect(normalizeGraphPath('././util.ts')).toBe('util.ts');
  });
});

Rules:

The describe label includes the original finding ID (e.g. F8, F9, F1.1) so future readers can map back to the review.
The leading comment in the body explains the prior buggy behavior in concrete terms — what the code did before, not what it does now.
One regression test per finding. Do not pile unrelated assertions into a single regression block.

Examples in-tree: tests/unit/graph/graph-query.test.ts, tests/unit/graph/import-extractor.test.ts, tests/unit/graph/graph-store.test.ts.

Cross-Entry Invariants (config maps)

When you modify any entry of a "map of agents/tools/roles" in src/config/constants.ts (AGENT_TOOL_MAP, DEFAULT_MODELS, QA_AGENTS, PIPELINE_AGENTS, etc.), there are tests that assert parity across sibling entries, not just shape of one entry.

Known parity assertions:

Test	Invariant
`tests/unit/config/critic-registration.test.ts:67`	`AGENT_TOOL_MAP.critic_sounding_board.length === AGENT_TOOL_MAP.critic.length`
`tests/unit/config/agent-tool-map.test.ts:26`	`AGENT_TOOL_MAP.architect.length` is strictly greater than every other agent's
`tests/unit/config/agent-tool-map.test.ts:34`	every subagent's tool list `<= 20` entries
`tests/unit/config/constants.test.ts:48`	`ALL_SUBAGENT_NAMES.length === 13`
`tests/unit/config/constants.test.ts:137`	`Object.keys(DEFAULT_MODELS).length === 14`

Workflow when adding a tool to a single agent:

Add the entry.
Run bun --smol test tests/unit/config --timeout 60000 before pushing.
If a parity test fails, decide: mirror the change to sibling agents, or update the invariant test if the design intent has actually changed.
To inspect runtime shape quickly: bun -e "import { AGENT_TOOL_MAP } from './src/config/constants.ts'; for (const [k,v] of Object.entries(AGENT_TOOL_MAP)) console.log(k, v.length);"

Debugging CI failures

When CI reports a unit (ubuntu|macos|windows) failure:

Identify the actual failing test from the job log first. Do not assume it's a pre-existing failure based on a local repro of a different test. Open the failing job's URL and find the <file>:<line> in the Bun output. WebFetch can scrape this if the gh CLI isn't available.
Reproduce that exact file locally: bun --smol test tests/unit/<dir>/<file>.test.ts --timeout 30000.
Then check if the same failure reproduces on main. If yes, document as pre-existing in the PR description and continue with your branch's work; do not silently inherit the failure.
For dist-check failures: dist-check is a hard gate — never bypass or ignore it.
- If your PR touches src/: run bun run build, examine git diff -- dist/, and commit the rebuilt dist/ in the same PR.
- If your PR does not touch src/ but dist-check fails: origin/main has a dist drift. Do not commit rebuilt dist to your PR. The fix belongs on main, not in your PR.

Test Quality Standards

DO

Test real behavior: call the actual function with real inputs, assert on real outputs.
Test error paths: what happens with null, undefined, empty string, oversized input?
Use temp directories (fs.mkdtemp) for file I/O tests. Clean up in afterEach.
Assert on specific values, not just truthiness: expect(result.status).toBe('pending') not expect(result).toBeTruthy().

DO NOT

Do not test type definitions. expect(event.type === 'foo').toBe(true) tests TypeScript, not your code.
Do not test framework behavior. "Zod schema parses valid input" tests Zod, not your schema.
Do not test test utilities. If it only exists to support other tests, it doesn't need its own test.
Do not mock everything. If every dependency is mocked, you're testing the mock setup. Prefer real dependencies for pure functions and only mock I/O boundaries (filesystem, network, timers).

Anchored Content Assertions

When asserting that skill files, protocol docs, or structured markdown contain expected text, anchor your assertions to the relevant section rather than using bare toContain() on the full file content:

// WEAK — passes even if the word appears in prose outside the intended section
expect(content).toContain('DROP');

// STRONG — fails if the structured section is removed or relocated
const stage3Start = content.indexOf('#### Stage 3: Consult Critic Sounding Board');
const stage4Start = content.indexOf('#### Stage 4: Surface User Decision Packet');
const stage3Section = content.slice(stage3Start, stage4Start);
expect(stage3Section).toContain('DROP');
expect(stage3Section).toContain('ASK_USER');

Why this matters: A bare toContain('DROP') passes as long as the word appears anywhere in the file. If the structured outcomes section is deleted but a prose reference remains (e.g., "The critic may DROP irrelevant items"), the test still passes — silently hiding the removal. Section-anchored assertions fail when the content is actually removed from its intended location.

Use this pattern for:

Critic outcome mappings in skill files (DROP, ASK_USER, RESOLVE, REPHRASE)
Classification category lists (self_resolved, user_decision, etc.)
Any structured section where word presence is necessary but position-dependent
Do not hardcode version numbers. Version bumps are automated — a test asserting version === '6.31.3' breaks on every release.
Do not use sleep or setTimeout for synchronization. Use explicit signals, resolved promises, or Bun.sleep() with tight bounds.
Do not spawn cat /dev/zero, yes, or other infinite-output commands. Use sleep 30 for "blocking command" tests.

Cross-Platform Requirements

See also: Cross-Platform Test Patterns above for detailed guidance on mock keys, symlink behavior, temp directories, and line endings.

All tests must pass on Linux, macOS, and Windows unless explicitly gated with:

const isWindows = process.platform === 'win32';
if (isWindows) test.skip('reason', () => {});

Path handling

Use path.join() or path.resolve(), never string concatenation with /.
Temp directories: use os.tmpdir(), not hardcoded /tmp.
File comparisons: normalize paths before comparing (path.resolve(a) === path.resolve(b)).

Process spawning

Use .cmd extension on Windows for npm/bun binaries: process.platform === 'win32' ? 'bun.cmd' : 'bun'.
Use array-form spawn/spawnSync, never shell string commands.

Running Tests

bash (Linux / macOS)

# Single file
bun test tests/unit/hooks/scope-guard.test.ts

# Batch directory (safe for dirs without mock conflicts)
bun --smol test tests/unit/hooks --timeout 30000

# Per-file loop (required for tools/services/agents — prevents mock poisoning)
for f in tests/unit/tools/*.test.ts; do bun --smol test "$f" --timeout 30000; done

# CI-equivalent run for batch steps
bun --smol test tests/unit/cli --timeout 120000
bun --smol test tests/unit/commands tests/unit/config --timeout 120000

PowerShell (Windows)

# Single file
bun test tests/unit/hooks/scope-guard.test.ts

# Batch directory (safe for dirs without mock conflicts)
bun --smol test tests/unit/hooks --timeout 30000

# Per-file loop (required for tools/services/agents — prevents mock poisoning)
Get-ChildItem tests/unit/tools/*.test.ts | ForEach-Object { bun --smol test $_.FullName --timeout 30000 }

# CI-equivalent run for batch steps
bun --smol test tests/unit/cli --timeout 120000
bun --smol test tests/unit/commands tests/unit/config --timeout 120000

# Capture output to file (avoids truncation when output is large)
bun --smol test tests/unit/agents --timeout 60000 | Out-File "$env:TEMP\test_out.txt"; Get-Content "$env:TEMP\test_out.txt" | Select-Object -Last 50

Note: for f in ...; do bash syntax is invalid in PowerShell. Use Get-ChildItem | ForEach-Object instead. Select-String -Last N is also invalid — use Select-Object -Last N.

Warning: Running bun --smol test tests/unit/tools as a single batch will cause mock poisoning failures. Always use the per-file loop for directories in CI steps 4-6 (tools, services, agents, etc.).

The --smol flag reduces Bun's memory footprint. Use it when running large directories (50+ files).

The --timeout 120000 flag sets per-test timeout to 120 seconds. Individual tests should complete in under 5 seconds. If a test needs more than 10 seconds, it's doing too much — split it or mock the slow dependency.

Before Submitting

Run the tests for your changed files: bun test path/to/your.test.ts
Run the full CI group your tests belong to (see pipeline structure above)
Verify no process.cwd() usage — use the directory parameter from createSwarmTool or hook constructor
Verify no hardcoded paths (/tmp/..., C:\...) — use os.tmpdir() + path.join()
Verify mocks are restored in afterEach if using spyOn or mock.module

Known Pre-existing Test Failures

The following test failures are pre-existing and unrelated to mock isolation:

Test file	Failures	Cause	Status
`tests/unit/hooks/full-auto-intercept.test.ts`	21/37	`logger.log` returns early without `OPENCODE_SWARM_DEBUG=1`	Pre-existing
`tests/unit/hooks/full-auto-intercept.dispatch.test.ts`	2/46	Same logger issue	Pre-existing
`tests/unit/commands/help-compound-commands.test.ts`	Multiple	Command routing issues	Pre-existing
`tests/unit/commands/index.test.ts`	Multiple	Command routing issues	Pre-existing
`tests/unit/commands/issue-command.test.ts`	Multiple	Command routing issues	Pre-existing
`src/__tests__/preflight-phase.test.ts`	3/3	`loadPlan` called twice per invocation (lines 930 + 545)	Bug exposed by cleanup
`tests/unit/agents/architect-sounding-board-protocol.adversarial.test.ts`	1	Token budget threshold `35000` exceeded by prompt growth; soft regression indicator that prompt size needs attention	Pre-existing

Known Cross-module mock.module Locations

The following directories contain test files that use cross-module mock.module (permitted under two-tier convention):

tests/unit/commands/ — mocks tools, hooks, services, state
tests/unit/hooks/ — mocks knowledge-store, knowledge-validator, knowledge-reader, telemetry, utils
tests/unit/tools/ — mocks Node built-ins (fs, child_process), sast-baseline, build/discovery
tests/unit/services/ — mocks path-security
tests/unit/config/ — mocks node:fs/promises
tests/unit/background/ — mocks utils, event-bus, evidence-summary-service
tests/unit/council/ — mocks node:fs
tests/unit/plan/ — mocks spec-hash
tests/unit/mutation/ — mocks node:child_process
tests/unit/git/ — mocks node:child_process
tests/integration/ — mocks co-change-analyzer, knowledge-store
src/__tests__/ — mocks plan/manager, preflight-service, telemetry
src/hooks/ — mocks logger, event-bus
src/tools/__tests__/ — mocks test-impact/analyzer, build/discovery, path-security
src/mutation/__tests__/ — mocks state
src/agents/ — mocks node:fs/promises
src/background/ — mocks vulnerability trigger

Dead-code _internals Seams

The following source modules export _internals but have no test consumers (as of this writing). They are harmless but may be removed in future cleanup:

src/tools/secretscan.ts
src/tools/knowledge-recall.ts
src/tools/lint.ts
src/tools/sast-scan.ts
src/tools/sast-baseline.ts
src/mutation/gate.ts
src/mutation/equivalence.ts
src/mutation/engine.ts
src/db/qa-gate-profile.ts
src/config/schema.ts
src/config/index.ts
src/commands/registry.ts
src/background/manager.ts
src/background/event-bus.ts
src/agents/critic.ts

name	writing-tests
description	Guidelines for writing, organizing, and maintaining tests in the opencode-swarm repository. Covers framework rules (bun:test), mock isolation, CI pipeline structure, file placement, and anti-patterns that break cross-platform CI. Load this skill before writing or modifying any test file.

Writing Tests for opencode-swarm

⚠️ Do NOT use the OpenCode test_runner tool to validate the full repo. It is for targeted agent validation with explicit files: [...] or small targeted scopes. scope: 'all' requires allow_full_suite: true and is intended for opt-in CI mirrors only. Broad scopes can stall or kill OpenCode before the MAX_SAFE_TEST_FILES = 50 (src/tools/test-runner.ts:26) guard fires. For repo validation, use the shell commands in this file — per-file isolation loops match CI behavior. allow_full_suite should be used only when intentional and justified in the PR description. See AGENTS.md invariant 6 for the full contract.

⛔ STOP — Read Before Running Any Tests

test_runner scope safety — one rule, no exceptions:

Scope	Files param	Safe?
`'convention'`	single source file	✅ Safe
`'convention'`	multiple source files	❌ Rejected — guard fires (`scope_exceeded`) before fan-out; use shell loop
`'convention'`	direct test file paths	✅ Safe — exempt from source-file limit
`'graph'`	single file	✅ Safe
`'graph'`	multiple files	❌ Rejected (`scope_exceeded`) — guard fires before import-graph traversal
`'impact'`	multiple files	❌ Rejected (`scope_exceeded`) — same reason
`'all'`	any	❌ Never in agent context

If you need to run tests across multiple source files: use a per-file shell loop, not test_runner.

# PowerShell (Windows)
bun --smol test tests/unit/agents --timeout 60000 | Out-File "$env:TEMP\test_out.txt"; Get-Content "$env:TEMP\test_out.txt" | Select-Object -Last 30

# bash (Linux/macOS)
bun --smol test tests/unit/agents --timeout 60000 2>&1 | tee /tmp/test_out.txt | tail -30

Framework: bun:test Only

All test files MUST import from bun:test:

import { describe, test, expect, beforeEach, afterEach } from 'bun:test';

vitest API	bun:test equivalent	Notes
`vi.fn()`	`mock(() => ...)`	Import `mock` from `bun:test`
`vi.spyOn(obj, method)`	`spyOn(obj, method)`	Import `spyOn` from `bun:test`
`vi.mock('module', factory)`	`mock.module('module', factory)`	Import `mock` from `bun:test`
`vi.restoreAllMocks()`	`mock.restore()`	Call in `afterEach`

Mock Isolation Rules

CRITICAL: Module-level mocks leak across test files within the same Bun process.

Rules

Spread the real module when mocking. Only override the specific export you need:

import * as realChildProcess from 'node:child_process';
const mockExecFileSync = mock(() => '');
mock.module('node:child_process', () => ({
  ...realChildProcess,          // preserve all other exports
  execFileSync: mockExecFileSync, // override only what you test
}));

Use lazy binding in source code. Import the namespace, call methods at invocation time:

// GOOD — mockable via mock.module
import * as child_process from 'node:child_process';
function run() { return child_process.execFileSync('git', ['status']); }

// BAD — binds at module load, mock.module can't intercept
import { execFileSync } from 'node:child_process';

Always add afterEach(mock.restore()) for cross-module mocks. Even though it is unreliable in Bun v1.3.11, it provides best-effort cleanup and reduces the window of cross-file contamination. Without it, the mock persists until the process exits:

import { afterEach, mock } from 'bun:test';

afterEach(() => {
  mock.restore();
});

Intentionally skipped on Windows (async child process handles cause EBUSY):

tests/unit/tools/pre-check-batch-sast-preexisting.test.ts
tests/unit/tools/pre-check-batch.adversarial.test.ts
tests/unit/tools/pre-check-batch-cwd.test.ts
tests/unit/tools/pre-check-batch-cwd.adversarial.test.ts
tests/unit/tools/pre-check-batch-contextdir-adversarial.test.ts
tests/unit/tools/pre-check-batch-secretscan-evidence.test.ts
tests/unit/tools/pre-check-batch.test.ts

Never create circular mock imports. This pattern deadlocks Bun:

// BROKEN — imports from the module it's about to mock
import { realFn } from '../../src/module.js';
vi.mock('../../src/module.js', () => ({
  realFn: (...args) => realFn(...args),  // circular!
  otherFn: vi.fn(),
}));

Instead, inline the function logic or extract the real functions into a separate utility module.

Prefer constructor/parameter injection over module mocking. The swarm's hook factories (createScopeGuardHook, createDelegationLedgerHook, etc.) accept injected dependencies — test them by passing mock callbacks, not by replacing modules.
Mock validateDirectory when testing with Windows temp paths. The path-security.ts validator rejects Windows absolute paths (C:\...). If your test uses os.tmpdir() and passes that path to a function that calls validateDirectory, mock it:

mock.module('../../../src/utils/path-security', () => ({
  validateDirectory: () => {},
  validateSwarmPath: (p: string) => p,
}));

Diagnosing Test Isolation Failures

When test files pass individually but fail when run together, follow this protocol:

Isolate: Run the failing file alone: bun test <file>.test.ts --timeout 30000
Pair: Run it WITH its suspected polluting neighbor: bun test <fileA>.test.ts <fileB>.test.ts
Classify:
- Both pass alone → fail together → mock pollution from neighbor
- Fails alone → test logic bug (not isolation issue)
- Passes alone + passes together but fails in full suite → third-file pollution (use binary search across directory)
For mock pollution, check the neighbor for these patterns:
- vi.mock() or mock.module() inside beforeEach() (not at top level)
- delete require.cache[...] combined with re-import pattern
- These indicate hoist-time closure capture — see below
Specific symptom — closure capture failure: vi.mock() captures closures at hoist time (before beforeEach runs). Reassigning mockFn.mockImplementation(newFn) in the test body does NOT update the hoisted closure — the mock still calls the original function.
- Symptom: expect(mockFn).toHaveBeenCalledTimes(N) fails with an unexpected count
- Symptom: expect(mockFn).not.toHaveBeenCalled() fails because the real function was called
Fix path: Migrate the affected test file to _internals DI seam pattern per the mock-to-internals-migration skill. This eliminates both the vi.mock() call and the closure capture surface area.

Two-Tier Mock Convention

The codebase uses a two-tier strategy for mock isolation, plus a zero-mock testing pattern:

Tier 0: _test_exports Pure Function Testing (Zero Mocks)

// In source file (src/tools/formatter.ts)
function formatEntry(entry: SomeType): string {
  // internal implementation — may use optional chaining, defaults, etc.
  return entry.score?.toFixed(2) ?? 'N/A';
}

// Public API (tool handler, command handler, etc.)
export function handleQuery(ctx: Context) {
  const entries = readData(ctx);
  return entries.map(formatEntry);
}

// Export seam for testing — only used by test files
export const _test_exports = { formatEntry };

// In test file (tests/unit/tools/formatter.test.ts)
import { _test_exports } from '../../../src/tools/formatter';

const { formatEntry } = _test_exports;

describe('formatEntry', () => {
  test('handles missing score', () => {
    expect(formatEntry({ score: undefined })).toBe('N/A');
  });
  test('formats numeric score', () => {
    expect(formatEntry({ score: 0.85 })).toBe('0.85');
  });
});

When to use Tier 0 vs Tier 1:

Tier 0 (_test_exports): The function is a pure utility (formatter, normalizer, transformer) that doesn't call external modules. No mocking needed — test it directly.
Tier 1 (_internals): You need to mock a function within the same module to test the caller in isolation. The function has side effects or calls external APIs.
Tier 2 (mock.module): You need to mock a dependency from another module (Node built-ins, other application modules).

Benefits of Tier 0:

Zero mock pollution — no mock.module calls, no mock.restore() needed
Works in batch test runs without per-file isolation
Type-safe (the exported object carries the real TypeScript types)
No filesystem dependencies (no tmpDir, no chdir, no existsSync)
Deterministic on all platforms and CI environments

Tier 1: _internals DI Seams (Within-Module)

For mocking functions within the same module, source files export an _internals object that wraps key functions. Tests can replace individual functions without using mock.module:

// In source file (src/services/my-service.ts)
export const _internals = {
  helperFn: () => { /* real implementation */ }
};

export function mainFn() {
  return _internals.helperFn();
}

// In test file
import { _internals, mainFn } from '../../../src/services/my-service';

test('mainFn uses mocked helper', () => {
  const original = _internals.helperFn;
  _internals.helperFn = mock(() => 'mocked');
  // ... test ...
  _internals.helperFn = original; // restore
});

Benefits:

No process-global mock pollution
Type-safe
Fast (no module re-parsing)
Works in batch test runs without isolation

Tier 2: mock.module (Cross-Module)

When mocking dependencies from other modules (especially Node built-ins), use mock.module with proper cleanup:

import * as realFs from 'node:fs/promises';

mock.module('node:fs/promises', () => ({
  ...realFs,  // MUST spread real exports
  readFile: mock(() => Promise.resolve('mocked')),
}));

afterEach(() => mock.restore());

Critical rules for cross-module mocks:

Always spread real exports for Node built-ins — other code depends on exports you don't mock
Always add afterEach(mock.restore()) — provides best-effort cleanup
Run in per-file isolation — CI runs each file in its own process (for f in *.test.ts; do bun --smol test "$f"; done)

Choosing Between Tiers

Scenario	Pattern	Example
Mocking a function in the same module you're testing	`_internals` seam	`src/state.ts` `_internals.loadSnapshot`
Mocking a Node built-in (fs, child_process, etc.)	`mock.module` + spread real	`mock.module('node:fs/promises', () => ({ ...realFs, readFile: mockFn }))`
Mocking another application module	`mock.module` + cleanup	`mock.module('../../../src/utils/logger', ...)` + `afterEach(mock.restore())`
File-scoped mock (applies to all tests in file)	`mock.module` at top level + `mockReset()` in `beforeEach`	Preflight tests with `mockLoadPlan.mockReset()`

mock.module() Export Completeness

Why this matters

How to verify completeness

Before finalizing a test that uses mock.module():

List all runtime exports of the target module (type-only exports are erased at compile time and need no stub):
```
grep -E "^export (const|function|async function|class) " src/path/to/module.ts
```
Note: Do NOT include type or interface exports — Bun erases these at compile time and they need no runtime stub.
Ensure every export name has an entry in your mock.module() factory.

Stubs can be minimal:

Functions: () => null or async () => {}

Zod schemas: use a comprehensive stub that supports common methods:

const zodStub = {
  parse: (v: unknown) => v,
  safeParse: (v: unknown) => ({ success: true as const, data: v }),
  parseAsync: async (v: unknown) => v,
};

Constants: appropriate zero values ('', 0, null, [], {})

Verification pattern

// ✅ CORRECT — all exports provided, test uses only the first one
mock.module('../../../src/config/schema.js', () => ({
  // The one export your test actually uses
  stripKnownSwarmPrefix: mockStripFn,
  // Stubs for transitive import resolution (never called in test)
  PluginConfigSchema: zodStub,
  ScoringConfigSchema: zodStub,
  isKnownCanonicalRole: () => false,
  // ... all other runtime exports as stubs
}));

// ❌ WRONG — missing exports cause SyntaxError at module-load time
mock.module('../../../src/config/schema.js', () => ({
  stripKnownSwarmPrefix: mockStripFn,
  // Missing: PluginConfigSchema, ScoringConfigSchema, etc.
  // → "SyntaxError: Export named 'PluginConfigSchema' not found"
}));

What IS and IS NOT test theater

Adding stubs for ESM resolution is NOT test theater — it's a Bun runtime requirement. The distinction:

Pattern	Test theater?	Why
Adding `PluginConfigSchema: zodStub` so the module loads	No	Required for ESM resolution; stub is never called
Stubbing `validateDirectory` to return `true` then asserting "validation works"	Yes	The stub bypasses the logic you should be testing
Using `zodStub` in assertions: `expect(zodStub.parse(input)).toBe(input)`	Yes	Testing the stub, not the real code
Adding stubs for ALL 50 Zod schemas in config/schema.ts	No	All are required for transitive import resolution

The stubs exist solely to satisfy the module loader. Test assertions must verify behavior through the real-mocked functions (the ones your test actually calls), not through the stubs.

Files Intentionally Using File-Scoped Mocks

Some test files use top-level mock.module that must persist across all tests in the file. These files use mockReset()/mockClear() in beforeEach instead of mock.restore() in afterEach:

src/__tests__/preflight-phase.test.ts — mocks plan/manager and preflight-service

Cross-Platform Test Patterns

Tests run on all three CI platforms (ubuntu, macos, windows). Path and filesystem behavior differs between them. Follow these patterns to prevent platform-specific failures:

Mock keys with filesystem paths

Use path.resolve() to construct mock keys the same way the source code does:

// ❌ WRONG — fails on Windows (mock expects '/safe/dir/linked.ts',
//    but path.resolve('/safe/dir', 'linked.ts') = 'D:\safe\dir\linked.ts')
mockRealpathSync.mockImplementation((inputPath: string) => {
  if (inputPath === '/safe/dir') return '/safe/dir';
  if (inputPath === '/safe/dir/linked.ts') return '/outside/linked.ts';
  return inputPath;
});

// ✅ CORRECT — path.resolve produces matching keys on all platforms
const mockDir = path.resolve('/safe/dir');
const linkedResolved = path.resolve(mockDir, 'linked.ts');
const outsideResolved = path.resolve('/outside/linked.ts');

// mockRealpathSync is a mock() function (bun:test) — see mocking patterns above
mockRealpathSync.mockImplementation((inputPath: string) => {
  if (inputPath === mockDir) return mockDir;
  if (inputPath === linkedResolved) return outsideResolved;
  return inputPath;
});

Symlink behavior differences

On Windows, fs.symlinkSync for directories creates junctions by default, which resolve differently than POSIX symlinks. Junction creation may require administrator elevation on older Node.js versions.
fs.realpathSync on a broken symlink throws ENOENT on POSIX but may throw EINVAL on Windows, depending on symlink type.
Use test.skipIf(process.platform === 'win32') for tests that directly manipulate filesystem symlinks, unless the test's purpose is explicitly to verify cross-platform symlink behavior.

Temporary directory patterns

Use os.tmpdir() + path.join() for temp paths. Never hardcode /tmp or C:\.
Wrap mkdtempSync in realpathSync if the result is chdir'd on macOS (temp dirs are often symlinked to /private/var/...).
Clean up temp dirs in afterEach or afterAll using the rm utility with { force: true, recursive: true }.

Line ending normalization

Git on Windows converts LF to CRLF by default. Tests that compare file contents byte-by-byte against expected strings must normalize line endings:

const actual = readFileSync(path, 'utf-8').replace(/\r\n/g, '\n');

CI Pipeline Structure

The CI runs on three platforms (ubuntu, macos, windows). Tests are split into sequential steps within each platform's job.

Step 1: hooks (Linux/macOS only, skipped on Windows) — batch per-group
Step 2: cli — batch
Step 3: commands + config — batch
Step 4: tools — per-file isolation loop
Step 5: services + build + quality + sast + sbom + scripts — per-file isolation loop
Step 6: state + agents + knowledge + evidence + plan + misc — per-file isolation loop

When writing a test, know which step your file will run in. In batch steps, do not assume isolation from other files in the same step.

Job timeout: 15 minutes. A single hanging test will kill the entire platform's test run.

File Placement

Convention

Test type	Location	When to use
Unit tests for `src/hooks/*.ts`	`tests/unit/hooks/`	Testing hook factories and hook behavior
Unit tests for `src/tools/*.ts`	`tests/unit/tools/`	Testing tool execute functions
Unit tests for `src/commands/*.ts`	`tests/unit/commands/`	Testing CLI command handlers
Unit tests for `src/config/*.ts`	`tests/unit/config/`	Testing schema validation, config loading
Unit tests for `src/agents/*.ts`	`tests/unit/agents/`	Testing agent prompt generation, factory logic
Colocated tests	`src/*/.test.ts`	Integration-style tests tightly coupled to the source module
Integration tests	`tests/integration/`	Cross-module workflows, plugin initialization
Security tests	`tests/security/`	Adversarial input handling, injection resistance
Smoke tests	`tests/smoke/`	Built package validation

Naming

Base test: <module>.test.ts
Adversarial variant: <module>.adversarial.test.ts

Regression tests (review-surfaced bugs)

When fixing a bug surfaced by code review, swarm review, or post-merge audit, always add a regression test with the following shape so the test's purpose survives future cleanup:

describe('<feature> — regression: <one-line description> (F#)', () => {
  it('<exact behavior the bug violated>', () => {
    // Previous code did <bad thing>: e.g. the regex `/^\.\/+/` only stripped
    // a single leading `./`, so `././util.ts` survived as `./util.ts`.
    expect(normalizeGraphPath('././util.ts')).toBe('util.ts');
  });
});

Rules:

The describe label includes the original finding ID (e.g. F8, F9, F1.1) so future readers can map back to the review.
The leading comment in the body explains the prior buggy behavior in concrete terms — what the code did before, not what it does now.
One regression test per finding. Do not pile unrelated assertions into a single regression block.

Examples in-tree: tests/unit/graph/graph-query.test.ts, tests/unit/graph/import-extractor.test.ts, tests/unit/graph/graph-store.test.ts.

Cross-Entry Invariants (config maps)

Known parity assertions:

Test	Invariant
`tests/unit/config/critic-registration.test.ts:67`	`AGENT_TOOL_MAP.critic_sounding_board.length === AGENT_TOOL_MAP.critic.length`
`tests/unit/config/agent-tool-map.test.ts:26`	`AGENT_TOOL_MAP.architect.length` is strictly greater than every other agent's
`tests/unit/config/agent-tool-map.test.ts:34`	every subagent's tool list `<= 20` entries
`tests/unit/config/constants.test.ts:48`	`ALL_SUBAGENT_NAMES.length === 13`
`tests/unit/config/constants.test.ts:137`	`Object.keys(DEFAULT_MODELS).length === 14`

Workflow when adding a tool to a single agent:

Add the entry.
Run bun --smol test tests/unit/config --timeout 60000 before pushing.
If a parity test fails, decide: mirror the change to sibling agents, or update the invariant test if the design intent has actually changed.
To inspect runtime shape quickly: bun -e "import { AGENT_TOOL_MAP } from './src/config/constants.ts'; for (const [k,v] of Object.entries(AGENT_TOOL_MAP)) console.log(k, v.length);"

Debugging CI failures

When CI reports a unit (ubuntu|macos|windows) failure:

Identify the actual failing test from the job log first. Do not assume it's a pre-existing failure based on a local repro of a different test. Open the failing job's URL and find the <file>:<line> in the Bun output. WebFetch can scrape this if the gh CLI isn't available.
Reproduce that exact file locally: bun --smol test tests/unit/<dir>/<file>.test.ts --timeout 30000.
Then check if the same failure reproduces on main. If yes, document as pre-existing in the PR description and continue with your branch's work; do not silently inherit the failure.
For dist-check failures: dist-check is a hard gate — never bypass or ignore it.
- If your PR touches src/: run bun run build, examine git diff -- dist/, and commit the rebuilt dist/ in the same PR.
- If your PR does not touch src/ but dist-check fails: origin/main has a dist drift. Do not commit rebuilt dist to your PR. The fix belongs on main, not in your PR.

Test Quality Standards

DO

Test real behavior: call the actual function with real inputs, assert on real outputs.
Test error paths: what happens with null, undefined, empty string, oversized input?
Use temp directories (fs.mkdtemp) for file I/O tests. Clean up in afterEach.
Assert on specific values, not just truthiness: expect(result.status).toBe('pending') not expect(result).toBeTruthy().

DO NOT

Do not test type definitions. expect(event.type === 'foo').toBe(true) tests TypeScript, not your code.
Do not test framework behavior. "Zod schema parses valid input" tests Zod, not your schema.
Do not test test utilities. If it only exists to support other tests, it doesn't need its own test.
Do not mock everything. If every dependency is mocked, you're testing the mock setup. Prefer real dependencies for pure functions and only mock I/O boundaries (filesystem, network, timers).

Anchored Content Assertions

// WEAK — passes even if the word appears in prose outside the intended section
expect(content).toContain('DROP');

// STRONG — fails if the structured section is removed or relocated
const stage3Start = content.indexOf('#### Stage 3: Consult Critic Sounding Board');
const stage4Start = content.indexOf('#### Stage 4: Surface User Decision Packet');
const stage3Section = content.slice(stage3Start, stage4Start);
expect(stage3Section).toContain('DROP');
expect(stage3Section).toContain('ASK_USER');

Use this pattern for:

Critic outcome mappings in skill files (DROP, ASK_USER, RESOLVE, REPHRASE)
Classification category lists (self_resolved, user_decision, etc.)
Any structured section where word presence is necessary but position-dependent
Do not hardcode version numbers. Version bumps are automated — a test asserting version === '6.31.3' breaks on every release.
Do not use sleep or setTimeout for synchronization. Use explicit signals, resolved promises, or Bun.sleep() with tight bounds.
Do not spawn cat /dev/zero, yes, or other infinite-output commands. Use sleep 30 for "blocking command" tests.

Cross-Platform Requirements

See also: Cross-Platform Test Patterns above for detailed guidance on mock keys, symlink behavior, temp directories, and line endings.

All tests must pass on Linux, macOS, and Windows unless explicitly gated with:

const isWindows = process.platform === 'win32';
if (isWindows) test.skip('reason', () => {});

Path handling

Use path.join() or path.resolve(), never string concatenation with /.
Temp directories: use os.tmpdir(), not hardcoded /tmp.
File comparisons: normalize paths before comparing (path.resolve(a) === path.resolve(b)).

Process spawning

Use .cmd extension on Windows for npm/bun binaries: process.platform === 'win32' ? 'bun.cmd' : 'bun'.
Use array-form spawn/spawnSync, never shell string commands.

Running Tests

bash (Linux / macOS)

# Single file
bun test tests/unit/hooks/scope-guard.test.ts

# Batch directory (safe for dirs without mock conflicts)
bun --smol test tests/unit/hooks --timeout 30000

# Per-file loop (required for tools/services/agents — prevents mock poisoning)
for f in tests/unit/tools/*.test.ts; do bun --smol test "$f" --timeout 30000; done

# CI-equivalent run for batch steps
bun --smol test tests/unit/cli --timeout 120000
bun --smol test tests/unit/commands tests/unit/config --timeout 120000

PowerShell (Windows)

# Single file
bun test tests/unit/hooks/scope-guard.test.ts

# Batch directory (safe for dirs without mock conflicts)
bun --smol test tests/unit/hooks --timeout 30000

# Per-file loop (required for tools/services/agents — prevents mock poisoning)
Get-ChildItem tests/unit/tools/*.test.ts | ForEach-Object { bun --smol test $_.FullName --timeout 30000 }

# CI-equivalent run for batch steps
bun --smol test tests/unit/cli --timeout 120000
bun --smol test tests/unit/commands tests/unit/config --timeout 120000

# Capture output to file (avoids truncation when output is large)
bun --smol test tests/unit/agents --timeout 60000 | Out-File "$env:TEMP\test_out.txt"; Get-Content "$env:TEMP\test_out.txt" | Select-Object -Last 50

Note: for f in ...; do bash syntax is invalid in PowerShell. Use Get-ChildItem | ForEach-Object instead. Select-String -Last N is also invalid — use Select-Object -Last N.

The --smol flag reduces Bun's memory footprint. Use it when running large directories (50+ files).

Before Submitting

Run the tests for your changed files: bun test path/to/your.test.ts
Run the full CI group your tests belong to (see pipeline structure above)
Verify no process.cwd() usage — use the directory parameter from createSwarmTool or hook constructor
Verify no hardcoded paths (/tmp/..., C:\...) — use os.tmpdir() + path.join()
Verify mocks are restored in afterEach if using spyOn or mock.module

Known Pre-existing Test Failures

The following test failures are pre-existing and unrelated to mock isolation:

Test file	Failures	Cause	Status
`tests/unit/hooks/full-auto-intercept.test.ts`	21/37	`logger.log` returns early without `OPENCODE_SWARM_DEBUG=1`	Pre-existing
`tests/unit/hooks/full-auto-intercept.dispatch.test.ts`	2/46	Same logger issue	Pre-existing
`tests/unit/commands/help-compound-commands.test.ts`	Multiple	Command routing issues	Pre-existing
`tests/unit/commands/index.test.ts`	Multiple	Command routing issues	Pre-existing
`tests/unit/commands/issue-command.test.ts`	Multiple	Command routing issues	Pre-existing
`src/__tests__/preflight-phase.test.ts`	3/3	`loadPlan` called twice per invocation (lines 930 + 545)	Bug exposed by cleanup
`tests/unit/agents/architect-sounding-board-protocol.adversarial.test.ts`	1	Token budget threshold `35000` exceeded by prompt growth; soft regression indicator that prompt size needs attention	Pre-existing

Known Cross-module mock.module Locations

The following directories contain test files that use cross-module mock.module (permitted under two-tier convention):

tests/unit/commands/ — mocks tools, hooks, services, state
tests/unit/hooks/ — mocks knowledge-store, knowledge-validator, knowledge-reader, telemetry, utils
tests/unit/tools/ — mocks Node built-ins (fs, child_process), sast-baseline, build/discovery
tests/unit/services/ — mocks path-security
tests/unit/config/ — mocks node:fs/promises
tests/unit/background/ — mocks utils, event-bus, evidence-summary-service
tests/unit/council/ — mocks node:fs
tests/unit/plan/ — mocks spec-hash
tests/unit/mutation/ — mocks node:child_process
tests/unit/git/ — mocks node:child_process
tests/integration/ — mocks co-change-analyzer, knowledge-store
src/__tests__/ — mocks plan/manager, preflight-service, telemetry
src/hooks/ — mocks logger, event-bus
src/tools/__tests__/ — mocks test-impact/analyzer, build/discovery, path-security
src/mutation/__tests__/ — mocks state
src/agents/ — mocks node:fs/promises
src/background/ — mocks vulnerability trigger

Dead-code _internals Seams

The following source modules export _internals but have no test consumers (as of this writing). They are harmless but may be removed in future cleanup:

src/tools/secretscan.ts
src/tools/knowledge-recall.ts
src/tools/lint.ts
src/tools/sast-scan.ts
src/tools/sast-baseline.ts
src/mutation/gate.ts
src/mutation/equivalence.ts
src/mutation/engine.ts
src/db/qa-gate-profile.ts
src/config/schema.ts
src/config/index.ts
src/commands/registry.ts
src/background/manager.ts
src/background/event-bus.ts
src/agents/critic.ts

writing-tests

المزيد من هذا المستودع

المزيد من هذا المستودع

Writing Tests for opencode-swarm

⛔ STOP — Read Before Running Any Tests

Framework: bun:test Only

Mock Isolation Rules

Rules

Diagnosing Test Isolation Failures

Two-Tier Mock Convention

Tier 0: _test_exports Pure Function Testing (Zero Mocks)

Tier 1: _internals DI Seams (Within-Module)

Tier 2: mock.module (Cross-Module)

Choosing Between Tiers

mock.module() Export Completeness

Why this matters

How to verify completeness

Verification pattern

What IS and IS NOT test theater

Files Intentionally Using File-Scoped Mocks

Cross-Platform Test Patterns

Mock keys with filesystem paths

Symlink behavior differences

Temporary directory patterns

Line ending normalization

CI Pipeline Structure

File Placement

Convention

Naming

Regression tests (review-surfaced bugs)

Cross-Entry Invariants (config maps)

Debugging CI failures

Test Quality Standards

DO

DO NOT

Anchored Content Assertions

Cross-Platform Requirements

Path handling

Process spawning

Running Tests

bash (Linux / macOS)

PowerShell (Windows)

Before Submitting

Known Pre-existing Test Failures

Known Cross-module mock.module Locations

Dead-code _internals Seams

Writing Tests for opencode-swarm

⛔ STOP — Read Before Running Any Tests

Framework: bun:test Only

Mock Isolation Rules

Rules

Diagnosing Test Isolation Failures

Two-Tier Mock Convention

Tier 0: _test_exports Pure Function Testing (Zero Mocks)

Tier 1: _internals DI Seams (Within-Module)

Tier 2: mock.module (Cross-Module)

Choosing Between Tiers

mock.module() Export Completeness

Why this matters

How to verify completeness

Verification pattern

What IS and IS NOT test theater

Files Intentionally Using File-Scoped Mocks

Cross-Platform Test Patterns

Mock keys with filesystem paths

Symlink behavior differences

Temporary directory patterns

Line ending normalization

CI Pipeline Structure

File Placement

Convention

Naming

Regression tests (review-surfaced bugs)

Cross-Entry Invariants (config maps)

Debugging CI failures

Test Quality Standards

DO

DO NOT

Anchored Content Assertions

Cross-Platform Requirements