| name | arch-compliance |
| description | Full-cycle architecture compliance with truthfulness audit for Kartix. Goes beyond "tests pass" to verify what is genuinely implemented vs stubbed, fake, or partially wired. Use this skill whenever the user wants to check architecture compliance, find what's still missing, finish implementing remaining gaps, keep going until in-scope work is actually closed or blocked, add or fix tests, audit what's real vs fake, rerun verification, measure honest feature coverage, or ensure the repo honestly matches the architecture and plan. Trigger on "check architecture", "what's left", "finish implementing", "complete everything else", "verify everything", "run the full cycle", "what's real vs stubbed", "audit implementation", "close the gaps", "add tests and coverage", or similar.
|
Architecture Compliance Full-Cycle
Run the Kartix delivery loop with a truthfulness-first approach. The critical insight this
skill enforces: "all tests pass" does not mean "all features work." A feature can have
passing unit tests while the end-to-end UX is broken, stubbed, or disconnected.
Default end state:
architecture scanned -> truthfulness audit -> gaps identified -> code implemented ->
tests added -> re-audit performed -> remaining in-scope gaps closed or explicitly blocked ->
central verification green -> UX proof captured -> honest coverage reported
Stop earlier only when the user explicitly asks for report-only, or when a blocker cannot
be resolved locally. If the cycle stops early, record why and what remains.
Incremental Mode
If output/arch-compliance/coverage-summary.json exists, load it first. Focus on:
- New code changes since last run (check git diff or file timestamps)
- Previously identified gaps that are still open
- Features that were implemented since last run but lack tests
- Any test count changes (frontend or backend) that need investigation
For incremental runs where nothing changed, a fast-path is acceptable: run static checks
and tests, confirm counts match, report clean pass. Do not regenerate all artifacts from
scratch unless something changed or the user requests a full cycle.
Do not use the fast-path if the previous run still showed any current-phase feature as
PARTIAL, FACADE, MISSING, or STUB, or if remaining_delta contains in-scope work.
In that case, the default is to continue closing implementation gaps, not just re-report them.
Required Inputs
Read at the start of every full-cycle run:
docs/ARCHITECTURE.md
docs/VERIFICATION_CONTRACT.md
docs/VERIFICATION_MATRIX.md
docs/PLAN_TRACEABILITY.md
docs/USE_CASES.md
docs/NON_GOALS.md
docs/UPSTREAM_INTEGRATION_STRATEGY.md
docs/PRODUCT_GOAL_PLAN.md
docs/PRODUCT_GOAL_EXECUTION_PLAN.md
Load only when needed:
docs/CHECKPOINT_SAFETY.md
docs/SIDECAR_COMPAT.md
src-tauri/providers.toml
- existing files under
output/arch-compliance/
North-Star Guardrails
Hard constraints:
- Kartix-native Rust/Tauri terminal core is the base layer.
- Official CLIs (Claude Code, Codex, Gemini CLI) are provider sidecars.
- Claude Code Mux is optional expert routing, never the default architecture.
- OpenFang is an optional post-alpha orchestration sidecar, not the base product.
- WaveTerm, LibreChat, OpenClaw are reference-only. Opcode is behavior inspiration only.
- Upstream sync is research-only under
research/upstreams/.
- If a request conflicts with current architecture or non-goals, record the delta and
propose the clean architectural path instead of silently violating docs.
Deliverables Per Run
Update or create under output/arch-compliance/:
feature-registry.json
implementation-map.json
truthfulness-audit.json — the new core artifact
gap-report.md
test-manifest.json
test-results.json
coverage-summary.md
coverage-summary.json
Optional per-run (produce when doing a full cycle, skip for incremental fast-path):
run-context.md
peer-review.md
e2e-feature-list.md
ux-report.md
product-delta.md
The Full Cycle
1. Align Scope
- Read required inputs.
- Reconcile what the architecture says is in scope, what USE_CASES.md claims, and what
the product goal plan says Kartix should become.
- If the user asks for future capability that is intentionally post-alpha, record it as
a roadmap delta unless they explicitly want scope expansion.
2. Build the Feature Registry
Extract every current-phase feature from docs. For each, record:
id, phase, name, status_claimed, user_value
source_artifacts, verification_artifacts
proof_mode (unit, integration, central, native-ux, browser-ux, or combinations)
north_star_link — which product-goal capability this feature supports
Save to feature-registry.json.
3. Truthfulness Audit — The Core of This Skill
This phase distinguishes this skill from a naive "run tests, report green" loop. For every
in-scope feature, determine its actual implementation status by reading the source code,
not just checking if tests pass.
3a. Classify Each Feature's Reality
For each feature, assign one of these categories:
| Category | Meaning | Example |
|---|
REAL | Fully implemented, wired to runtime, works end-to-end | macOS PTY spawning |
PARTIAL | Core logic exists but key parts missing or disconnected | Crash recovery saves state but doesn't restore layout |
STUB | Code compiles but bodies are todo!(), unimplemented!(), or empty | Linux/Windows platform backends |
MISSING | Expected by architecture but no code exists | Settings panel, approval workflow UI |
FACADE | Tests pass but the feature doesn't work for a real user | A UI button that renders but has no onClick handler |
The distinction between PARTIAL and FACADE is important: PARTIAL means some real work
happens but the job isn't finished. FACADE means the feature appears to exist (tests may
even pass) but a user would find it broken or nonfunctional.
Promotion rules:
- Do not mark a feature
REAL unless the runtime path is actually wired and the code does
useful work for the current phase.
- Do not treat docs, generated artifacts, screenshots, mocks, or passing helper tests as
implementation.
- Do not let a feature become
REAL just because one layer works in isolation. If the user
path is still broken, it remains PARTIAL or FACADE.
3b. Automated Stub Detection
Run these checks programmatically. Do not skip them:
Rust backend — search for:
todo!(), unimplemented!(), panic!("not implemented"), unreachable!("stub")
Grep patterns:
grep -rn 'todo!\|unimplemented!\|panic!.*not.impl' src-tauri/src/ --include='*.rs'
Frontend — search for:
onClick={() => {}}, onClick={undefined}, placeholder divs without content,
components that render but have no event handlers, buttons with no action
Wiring gaps — check that:
- Every Tauri command in
commands.rs is registered in lib.rs
- Every registered command does real work (not just
Ok(()))
- Every frontend component that calls
invoke() handles the response
- Every store action that claims to persist actually writes to backend
3c. End-to-End Wiring Check
For each feature marked REAL or PARTIAL, trace the full path:
- User action in frontend (click, type, etc.)
- Frontend calls Tauri command or uses store
- Backend processes the command
- Backend returns result or emits event
- Frontend displays the result
If any link in this chain is broken, the feature is PARTIAL at best, FACADE at worst.
3d. Known Problem Areas to Always Check
These are areas where the codebase has historically had truthfulness issues. Always verify:
| Area | What to check | Red flag |
|---|
| Platform backends | src-tauri/src/platform/linux.rs, windows.rs | todo!() stubs |
| Settings UI | Does SettingsPanel component exist? | Sidebar button disconnected |
| Crash recovery | Does restore actually recreate sessions? | Save works, restore is partial |
| Crash recovery: scrollback | Does checkpoint.rs capture terminal buffer? | save_checkpoint() is still a stub |
| Crash recovery: history wiring | Is session/history.rs called from shell events? | SQL API exists but command text extraction not wired |
| Approval workflow | Does approval mode UI enforce anything? | Flag passed to CLI but no UX |
| Agent panel | Can user actually run an agent end-to-end? | Detection works, invocation partial |
| Transcript rendering | Does AgentPanel use imperative DOM, not React state? | Per-event setState, no batching |
| Shell integration | Are all 3 OSC events consumed in frontend? | Only cwd-change consumed, prompt/command-end ignored |
| Supervisor lifecycle | Do agent/workflow spawns go through SupervisorManager? | Direct Command::new() bypasses supervisor |
| DB usage | Is SQLite used at runtime beyond tests? | Tables created but app uses localStorage |
3e. Coverage-Honesty Check
For every feature, record not just whether tests exist, but what kind of proof exists.
Use these buckets:
none — no meaningful test or UX proof
unit-only — helper logic tested, user path still unproven
integration — modules talk to each other realistically
central — exercised by shared verification such as scripts/verify.sh
ux — verified through native or browser action-path proof this run
mixed — multiple layers above
Coverage honesty rules:
- A feature with only
unit-only proof is not user-proven.
- A
PARTIAL or FACADE feature can still have tests; count that explicitly instead of
letting test growth imply feature completion.
- If tests pass while the feature is still broken for a user, record that as coverage
inflation, not progress.
- Always report proof levels separately:
runtime-real
deterministic-proof
native-ux-this-run
live-provider-this-run
- Do not collapse those proof levels into one "all green" or one inflated end-to-end count.
Save the full audit to truthfulness-audit.json with this schema:
{
"audit_date": "ISO timestamp",
"features": [
{
"id": "feature-id",
"name": "Human Name",
"claimed_status": "verified-now | implemented-surface | planned",
"actual_status": "REAL | PARTIAL | STUB | MISSING | FACADE",
"evidence": "specific file:line references",
"stub_locations": ["file:line patterns found"],
"wiring_chain": "complete | broken-at-step-N | not-wired",
"coverage_level": "none | unit-only | integration | central | ux | mixed",
"coverage_honesty": "honest | inflated | unknown",
"tests_pass": true,
"feature_works_e2e": true,
"gap_description": "what's missing or broken",
"fix_complexity": "trivial | small | medium | large | architectural"
}
],
"stub_scan_results": {
"rust_todos": ["file:line"],
"rust_unimplemented": ["file:line"],
"frontend_empty_handlers": ["file:line"],
"disconnected_ui": ["component:description"]
},
"summary": {
"total_features": 0,
"real": 0,
"partial": 0,
"stub": 0,
"missing": 0,
"facade": 0,
"tests_pass_but_feature_broken": 0
}
}
4. Gap Analysis
Compare the feature registry against the truthfulness audit.
Severity classification:
critical — claimed as verified-now but actual status is STUB, MISSING, or FACADE
high — claimed as implemented-surface but actual status is PARTIAL or FACADE
medium — in-scope for current phase, actual status is STUB or MISSING
low — minor issues (edge-case tests, docs drift, cosmetic)
info — future-phase items, roadmap deltas
Prioritize fixes:
- Features where tests pass but feature doesn't work (FACADE) — most dangerous
- Features claimed as verified but actually broken (critical)
- Missing implementations for in-scope architecture
- Missing or insufficient tests for REAL features
- Docs that claim more than reality delivers
Save to gap-report.md.
5. Implement the Gaps
Default behavior: fix the gaps, not just report them. Start with highest severity.
5a. Remaining-Implementation Closure Loop
Keep a live set of current-phase features whose actual status is not REAL and whose
deferred state is not explicitly justified by NON_GOALS.md or the phase plan.
Repeat until that set is empty or a blocker is recorded:
- Pick the highest-severity in-scope feature.
- Implement the missing runtime wiring or remove the fake facade.
- Add or update tests that expose the previous gap and prove the new behavior.
- Re-run targeted verification for that feature.
- Re-run the truthfulness audit for that feature.
- Only mark it complete when the feature's actual status improves honestly.
Do not stop at "tests are green" while current-phase features still sit in PARTIAL,
FACADE, MISSING, or unjustified STUB.
When implementing:
- Follow existing project conventions rigorously
- Prefer narrow, composable changes over speculative framework work
- Wire the change into the app's runtime path, not just helper code
- After implementing, re-run the truthfulness check for that feature
- Update architecture docs if implementation changes the truth
- Do not overbuild post-alpha ideas
For STUB features (like Linux/Windows backends): if they're explicitly deferred per
NON_GOALS.md or the phase plan, leave them as stubs but ensure the stubs:
- Have clear
todo!("Phase F: implement Linux PTY backend") messages
- Are documented in the gap report as intentionally deferred
- Don't have tests that make them appear implemented
For FACADE features: either implement them properly or remove the facade. A button
that does nothing is worse than no button — it misleads users.
6. Create Tests That Prove Features Work
For every feature implemented or changed, add tests at three levels:
Level 1 — Unit tests: test individual functions, state transitions, error paths.
Level 2 — Integration tests: test module interactions, Tauri command handlers
with realistic inputs, frontend component rendering with store state.
Level 3 — Truthfulness tests: tests that verify the feature works end-to-end,
not just that the code compiles. Examples:
- A test for crash recovery should verify that sessions are actually recreated,
not just that crash state serializes correctly.
- A test for agent detection should verify that detected agents appear in the UI,
not just that the detection function returns data.
- A test for settings should verify that changes persist across app restart,
not just that the store setter works.
For features still classified as PARTIAL or FACADE, add exposure tests when feasible:
- a test that fails on the broken path before the fix
- a test that would prevent the feature from silently regressing back into a facade
Coverage bookkeeping per feature:
- note whether the feature is only unit-covered, centrally verified, or user-proven
- do not upgrade UX status from component tests alone
- if a feature remains
PARTIAL, its tests should reduce ambiguity, not falsely imply closure
Test naming follows existing patterns:
- Rust:
test_<feature>_<scenario>
- Frontend:
describe('<Component>') > it('should <behavior>')
New tests must land in the repo's shared test structure. Wire them into
scripts/verify.sh or the central harness when realistic.
Save created/updated tests in test-manifest.json.
7. Run Verification
Use a fast-to-slow ladder while editing:
cargo fmt --manifest-path src-tauri/Cargo.toml --all --check
npx tsc --noEmit
npm run test
npm run build
cargo test --manifest-path src-tauri/Cargo.toml
cargo clippy --manifest-path src-tauri/Cargo.toml --all-targets --all-features -- -D warnings
bash scripts/verify.sh
The required end state is a green bash scripts/verify.sh unless a blocker outside
the repo makes that impossible.
Record results in test-results.json.
After every meaningful implementation cluster:
- refresh the changed features in
truthfulness-audit.json
- check whether any current-phase
PARTIAL or FACADE feature remains
- only move to final reporting once the remaining set is either empty or blocked
8. UX Verification
Exercise features the way a user would. For each feature claim:
- Perform the action path
- Confirm with an authoritative assertion
- Capture evidence
- Mark as:
verified-this-run, pass-with-caveat, not-proven, or failed
Do not leave a feature at verified-now if you could not prove it this run.
Downgrade the claim or record the caveat.
9. Coverage Summary
Produce coverage-summary.md and coverage-summary.json.
The summary must include:
- Total in-scope features and their reality breakdown (REAL/PARTIAL/STUB/MISSING/FACADE)
- How many tests pass vs how many features actually work end-to-end
- What changed this run
- What still blocks the next clean pass
- Honest assessment of "tests pass but feature broken" count
- Honest assessment of "feature partially implemented but already has passing tests" count
- Honest assessment of "real but only unit-covered" count
The coverage-summary.json schema:
{
"generated_at_utc": "ISO timestamp",
"scope": "Phase A1 + A2",
"metrics": {
"total_in_scope_features": 0,
"real": 0,
"partial": 0,
"stub": 0,
"missing": 0,
"facade": 0
},
"test_counts": {
"frontend_passed": 0,
"rust_unit_passed": 0,
"rust_e2e_passed": 0,
"rust_transport_passed": 0,
"total_passed": 0,
"total_failed": 0
},
"shared_verification": {
"command": "bash scripts/verify.sh",
"status": "pass | fail",
"native_smoke_artifact": "path"
},
"truthfulness": {
"tests_pass_but_feature_broken": 0,
"partial_features_with_passing_tests": 0,
"real_but_only_unit_tested": 0,
"features_honestly_working_e2e": 0
},
"changed_this_run": [],
"remaining_delta": []
}
Execution Strategy
Incremental fast-path
When coverage-summary.json exists and no source files changed since last run:
- Run static checks (fmt, clippy, tsc)
- Run test suites
- Compare test counts to previous run
- Confirm previous run had no current-phase
PARTIAL, FACADE, MISSING, or unjustified STUB
- If all match and no such gaps remain: report clean pass with no changes needed
- If counts differ or open gaps remain: investigate, fix, update artifacts
Full-cycle triggers
Run the full cycle (all 9 phases) when:
- First run (no previous artifacts)
- User explicitly requests full cycle
- Source files changed since last run
- Previous run had gaps that should now be checked
- Test counts changed unexpectedly
Parallel execution
When subagents are available, parallelize:
- Phase 2 can scan Rust and Frontend simultaneously
- Phase 3 can audit backend and frontend in parallel
- Phase 6 test creation for Rust and Frontend in parallel
- Phase 7 can run frontend and backend test suites in parallel
Codex-Oriented Execution Rules
- Default to action. Fix gaps, don't just report them.
- Use subagents for review and independent confirmation when available.
- Keep artifacts current. Refresh output files instead of letting stale reports stand.
- Central verification is a first-class deliverable.
- UX proof is a first-class deliverable.
- Extend existing shared scripts rather than creating disconnected alternatives.
- When fixing formatting or type errors, apply the fix and re-verify. Don't just report it.
- If roadmap truth changes, update
docs/PRODUCT_GOAL_PLAN.md and docs/PRODUCT_GOAL_EXECUTION_PLAN.md
in the same run instead of leaving the docs behind the implementation truth.
Auxiliary Skills
Use when applicable:
macos-native-app-qa — packaged or native macOS app proof
playwright — browser automation
playwright-interactive — iterative browser or Electron debugging
Key Paths
docs/ # Architecture and verification docs
src-tauri/src/ # Rust backend source
src-tauri/tests/ # Rust integration tests
src/ # React frontend source
src/components/ # UI components
src/stores/ # Zustand stores
scripts/verify.sh # Full verification
output/arch-compliance/ # This skill's output directory