Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

arch-compliance

Full-cycle architecture compliance with truthfulness audit for Kartix. Goes beyond "tests pass" to verify what is genuinely implemented vs stubbed, fake, or partially wired. Use this skill whenever the user wants to check architecture compliance, find what's still missing, finish implementing remaining gaps, keep going until in-scope work is actually closed or blocked, add or fix tests, audit what's real vs fake, rerun verification, measure honest feature coverage, or ensure the repo honestly matches the architecture and plan. Trigger on "check architecture", "what's left", "finish implementing", "complete everything else", "verify everything", "run the full cycle", "what's real vs stubbed", "audit implementation", "close the gaps", "add tests and coverage", or similar.

In Manus ausführen

Sterne1

Forks0

Aktualisiert11. März 2026 um 16:57

Quelle

FutureAtoms

FutureAtoms/Kartix

GitHub-Repository öffnen Creator-Repositorys ansehen

Installationsbefehl

Download

In Manus ausführen

Nützlich fürSOC

Softwarequalitätssicherungsanalysten und -testerInformatik- und Mathematikberufe15-1253L4

Datei-Explorer

3 Dateien

SKILL.md

readonly

Mehr aus diesem Repository

gleiches Repository

macos-native-app-qa

FutureAtoms/Kartix

Use when the task requires end-to-end verification of a native macOS app and browser-only tooling is not enough. This skill drives apps with AppleScript, System Events, or CGEvent (via Python Quartz or Swift), verifies behavior through logs, test hooks, files, or SQLite state, captures screenshots as evidence, AND actively maintains test coverage by analyzing code changes and writing new tests. Trigger for Tauri, Electron, AppKit, or other macOS desktop QA, launch flows, menu and dialog interaction, settings changes, packaged-app smoke tests, or any request to prove a macOS app works end to end.

2026-03-111

playwright

FutureAtoms/Kartix

Use when the task requires automating a real browser from the terminal (navigation, form filling, snapshots, screenshots, data extraction, UI-flow debugging) via `playwright-cli` or the bundled wrapper script.

2026-03-111

name

arch-compliance

description

Architecture Compliance Full-Cycle

Run the Kartix delivery loop with a truthfulness-first approach. The critical insight this skill enforces: "all tests pass" does not mean "all features work." A feature can have passing unit tests while the end-to-end UX is broken, stubbed, or disconnected.

Default end state:

architecture scanned -> truthfulness audit -> gaps identified -> code implemented ->
tests added -> re-audit performed -> remaining in-scope gaps closed or explicitly blocked ->
central verification green -> UX proof captured -> honest coverage reported

Stop earlier only when the user explicitly asks for report-only, or when a blocker cannot be resolved locally. If the cycle stops early, record why and what remains.

Incremental Mode

If output/arch-compliance/coverage-summary.json exists, load it first. Focus on:

New code changes since last run (check git diff or file timestamps)
Previously identified gaps that are still open
Features that were implemented since last run but lack tests
Any test count changes (frontend or backend) that need investigation

For incremental runs where nothing changed, a fast-path is acceptable: run static checks and tests, confirm counts match, report clean pass. Do not regenerate all artifacts from scratch unless something changed or the user requests a full cycle.

Do not use the fast-path if the previous run still showed any current-phase feature as PARTIAL, FACADE, MISSING, or STUB, or if remaining_delta contains in-scope work. In that case, the default is to continue closing implementation gaps, not just re-report them.

Required Inputs

Read at the start of every full-cycle run:

docs/ARCHITECTURE.md
docs/VERIFICATION_CONTRACT.md
docs/VERIFICATION_MATRIX.md
docs/PLAN_TRACEABILITY.md
docs/USE_CASES.md
docs/NON_GOALS.md
docs/UPSTREAM_INTEGRATION_STRATEGY.md
docs/PRODUCT_GOAL_PLAN.md
docs/PRODUCT_GOAL_EXECUTION_PLAN.md

Load only when needed:

docs/CHECKPOINT_SAFETY.md
docs/SIDECAR_COMPAT.md
src-tauri/providers.toml
existing files under output/arch-compliance/

North-Star Guardrails

Hard constraints:

Kartix-native Rust/Tauri terminal core is the base layer.
Official CLIs (Claude Code, Codex, Gemini CLI) are provider sidecars.
Claude Code Mux is optional expert routing, never the default architecture.
OpenFang is an optional post-alpha orchestration sidecar, not the base product.
WaveTerm, LibreChat, OpenClaw are reference-only. Opcode is behavior inspiration only.
Upstream sync is research-only under research/upstreams/.
If a request conflicts with current architecture or non-goals, record the delta and propose the clean architectural path instead of silently violating docs.

Deliverables Per Run

Update or create under output/arch-compliance/:

feature-registry.json
implementation-map.json
truthfulness-audit.json — the new core artifact
gap-report.md
test-manifest.json
test-results.json
coverage-summary.md
coverage-summary.json

Optional per-run (produce when doing a full cycle, skip for incremental fast-path):

run-context.md
peer-review.md
e2e-feature-list.md
ux-report.md
product-delta.md

The Full Cycle

1. Align Scope

Read required inputs.
Reconcile what the architecture says is in scope, what USE_CASES.md claims, and what the product goal plan says Kartix should become.
If the user asks for future capability that is intentionally post-alpha, record it as a roadmap delta unless they explicitly want scope expansion.

2. Build the Feature Registry

Extract every current-phase feature from docs. For each, record:

id, phase, name, status_claimed, user_value
source_artifacts, verification_artifacts
proof_mode (unit, integration, central, native-ux, browser-ux, or combinations)
north_star_link — which product-goal capability this feature supports

Save to feature-registry.json.

3. Truthfulness Audit — The Core of This Skill

This phase distinguishes this skill from a naive "run tests, report green" loop. For every in-scope feature, determine its actual implementation status by reading the source code, not just checking if tests pass.

3a. Classify Each Feature's Reality

For each feature, assign one of these categories:

Category	Meaning	Example
`REAL`	Fully implemented, wired to runtime, works end-to-end	macOS PTY spawning
`PARTIAL`	Core logic exists but key parts missing or disconnected	Crash recovery saves state but doesn't restore layout
`STUB`	Code compiles but bodies are `todo!()`, `unimplemented!()`, or empty	Linux/Windows platform backends
`MISSING`	Expected by architecture but no code exists	Settings panel, approval workflow UI
`FACADE`	Tests pass but the feature doesn't work for a real user	A UI button that renders but has no onClick handler

The distinction between PARTIAL and FACADE is important: PARTIAL means some real work happens but the job isn't finished. FACADE means the feature appears to exist (tests may even pass) but a user would find it broken or nonfunctional.

Promotion rules:

Do not mark a feature REAL unless the runtime path is actually wired and the code does useful work for the current phase.
Do not treat docs, generated artifacts, screenshots, mocks, or passing helper tests as implementation.
Do not let a feature become REAL just because one layer works in isolation. If the user path is still broken, it remains PARTIAL or FACADE.

3b. Automated Stub Detection

Run these checks programmatically. Do not skip them:

Rust backend — search for:

todo!(), unimplemented!(), panic!("not implemented"), unreachable!("stub")

Grep patterns:

grep -rn 'todo!\|unimplemented!\|panic!.*not.impl' src-tauri/src/ --include='*.rs'

Frontend — search for:

onClick={() => {}}, onClick={undefined}, placeholder divs without content,
components that render but have no event handlers, buttons with no action

Wiring gaps — check that:

Every Tauri command in commands.rs is registered in lib.rs
Every registered command does real work (not just Ok(()))
Every frontend component that calls invoke() handles the response
Every store action that claims to persist actually writes to backend

3c. End-to-End Wiring Check

For each feature marked REAL or PARTIAL, trace the full path:

User action in frontend (click, type, etc.)
Frontend calls Tauri command or uses store
Backend processes the command
Backend returns result or emits event
Frontend displays the result

If any link in this chain is broken, the feature is PARTIAL at best, FACADE at worst.

3d. Known Problem Areas to Always Check

These are areas where the codebase has historically had truthfulness issues. Always verify:

Area	What to check	Red flag
Platform backends	`src-tauri/src/platform/linux.rs`, `windows.rs`	`todo!()` stubs
Settings UI	Does `SettingsPanel` component exist?	Sidebar button disconnected
Crash recovery	Does restore actually recreate sessions?	Save works, restore is partial
Crash recovery: scrollback	Does `checkpoint.rs` capture terminal buffer?	`save_checkpoint()` is still a stub
Crash recovery: history wiring	Is `session/history.rs` called from shell events?	SQL API exists but command text extraction not wired
Approval workflow	Does approval mode UI enforce anything?	Flag passed to CLI but no UX
Agent panel	Can user actually run an agent end-to-end?	Detection works, invocation partial
Transcript rendering	Does AgentPanel use imperative DOM, not React state?	Per-event setState, no batching
Shell integration	Are all 3 OSC events consumed in frontend?	Only cwd-change consumed, prompt/command-end ignored
Supervisor lifecycle	Do agent/workflow spawns go through SupervisorManager?	Direct Command::new() bypasses supervisor
DB usage	Is SQLite used at runtime beyond tests?	Tables created but app uses localStorage

3e. Coverage-Honesty Check

For every feature, record not just whether tests exist, but what kind of proof exists. Use these buckets:

none — no meaningful test or UX proof
unit-only — helper logic tested, user path still unproven
integration — modules talk to each other realistically
central — exercised by shared verification such as scripts/verify.sh
ux — verified through native or browser action-path proof this run
mixed — multiple layers above

Coverage honesty rules:

A feature with only unit-only proof is not user-proven.
A PARTIAL or FACADE feature can still have tests; count that explicitly instead of letting test growth imply feature completion.
If tests pass while the feature is still broken for a user, record that as coverage inflation, not progress.
Always report proof levels separately:
- runtime-real
- deterministic-proof
- native-ux-this-run
- live-provider-this-run
Do not collapse those proof levels into one "all green" or one inflated end-to-end count.

Save the full audit to truthfulness-audit.json with this schema:

{
  "audit_date": "ISO timestamp",
  "features": [
    {
      "id": "feature-id",
      "name": "Human Name",
      "claimed_status": "verified-now | implemented-surface | planned",
      "actual_status": "REAL | PARTIAL | STUB | MISSING | FACADE",
      "evidence": "specific file:line references",
      "stub_locations": ["file:line patterns found"],
      "wiring_chain": "complete | broken-at-step-N | not-wired",
      "coverage_level": "none | unit-only | integration | central | ux | mixed",
      "coverage_honesty": "honest | inflated | unknown",
      "tests_pass": true,
      "feature_works_e2e": true,
      "gap_description": "what's missing or broken",
      "fix_complexity": "trivial | small | medium | large | architectural"
    }
  ],
  "stub_scan_results": {
    "rust_todos": ["file:line"],
    "rust_unimplemented": ["file:line"],
    "frontend_empty_handlers": ["file:line"],
    "disconnected_ui": ["component:description"]
  },
  "summary": {
    "total_features": 0,
    "real": 0,
    "partial": 0,
    "stub": 0,
    "missing": 0,
    "facade": 0,
    "tests_pass_but_feature_broken": 0
  }
}

4. Gap Analysis

Compare the feature registry against the truthfulness audit.

Severity classification:

critical — claimed as verified-now but actual status is STUB, MISSING, or FACADE
high — claimed as implemented-surface but actual status is PARTIAL or FACADE
medium — in-scope for current phase, actual status is STUB or MISSING
low — minor issues (edge-case tests, docs drift, cosmetic)
info — future-phase items, roadmap deltas

Prioritize fixes:

Features where tests pass but feature doesn't work (FACADE) — most dangerous
Features claimed as verified but actually broken (critical)
Missing implementations for in-scope architecture
Missing or insufficient tests for REAL features
Docs that claim more than reality delivers

Save to gap-report.md.

5. Implement the Gaps

Default behavior: fix the gaps, not just report them. Start with highest severity.

5a. Remaining-Implementation Closure Loop

Keep a live set of current-phase features whose actual status is not REAL and whose deferred state is not explicitly justified by NON_GOALS.md or the phase plan.

Repeat until that set is empty or a blocker is recorded:

Pick the highest-severity in-scope feature.
Implement the missing runtime wiring or remove the fake facade.
Add or update tests that expose the previous gap and prove the new behavior.
Re-run targeted verification for that feature.
Re-run the truthfulness audit for that feature.
Only mark it complete when the feature's actual status improves honestly.

Do not stop at "tests are green" while current-phase features still sit in PARTIAL, FACADE, MISSING, or unjustified STUB.

When implementing:

Follow existing project conventions rigorously
Prefer narrow, composable changes over speculative framework work
Wire the change into the app's runtime path, not just helper code
After implementing, re-run the truthfulness check for that feature
Update architecture docs if implementation changes the truth
Do not overbuild post-alpha ideas

For STUB features (like Linux/Windows backends): if they're explicitly deferred per NON_GOALS.md or the phase plan, leave them as stubs but ensure the stubs:

Have clear todo!("Phase F: implement Linux PTY backend") messages
Are documented in the gap report as intentionally deferred
Don't have tests that make them appear implemented

For FACADE features: either implement them properly or remove the facade. A button that does nothing is worse than no button — it misleads users.

6. Create Tests That Prove Features Work

For every feature implemented or changed, add tests at three levels:

Level 1 — Unit tests: test individual functions, state transitions, error paths.

Level 2 — Integration tests: test module interactions, Tauri command handlers with realistic inputs, frontend component rendering with store state.

Level 3 — Truthfulness tests: tests that verify the feature works end-to-end, not just that the code compiles. Examples:

A test for crash recovery should verify that sessions are actually recreated, not just that crash state serializes correctly.
A test for agent detection should verify that detected agents appear in the UI, not just that the detection function returns data.
A test for settings should verify that changes persist across app restart, not just that the store setter works.

For features still classified as PARTIAL or FACADE, add exposure tests when feasible:

a test that fails on the broken path before the fix
a test that would prevent the feature from silently regressing back into a facade

Coverage bookkeeping per feature:

note whether the feature is only unit-covered, centrally verified, or user-proven
do not upgrade UX status from component tests alone
if a feature remains PARTIAL, its tests should reduce ambiguity, not falsely imply closure

Test naming follows existing patterns:

Rust: test_<feature>_<scenario>
Frontend: describe('<Component>') > it('should <behavior>')

New tests must land in the repo's shared test structure. Wire them into scripts/verify.sh or the central harness when realistic.

Save created/updated tests in test-manifest.json.

7. Run Verification

Use a fast-to-slow ladder while editing:

# Fast checks
cargo fmt --manifest-path src-tauri/Cargo.toml --all --check
npx tsc --noEmit

# Tests
npm run test
npm run build
cargo test --manifest-path src-tauri/Cargo.toml
cargo clippy --manifest-path src-tauri/Cargo.toml --all-targets --all-features -- -D warnings

# Full central verification
bash scripts/verify.sh

The required end state is a green bash scripts/verify.sh unless a blocker outside the repo makes that impossible.

Record results in test-results.json.

After every meaningful implementation cluster:

refresh the changed features in truthfulness-audit.json
check whether any current-phase PARTIAL or FACADE feature remains
only move to final reporting once the remaining set is either empty or blocked

8. UX Verification

Exercise features the way a user would. For each feature claim:

Perform the action path
Confirm with an authoritative assertion
Capture evidence
Mark as: verified-this-run, pass-with-caveat, not-proven, or failed

Do not leave a feature at verified-now if you could not prove it this run. Downgrade the claim or record the caveat.

9. Coverage Summary

Produce coverage-summary.md and coverage-summary.json.

The summary must include:

Total in-scope features and their reality breakdown (REAL/PARTIAL/STUB/MISSING/FACADE)
How many tests pass vs how many features actually work end-to-end
What changed this run
What still blocks the next clean pass
Honest assessment of "tests pass but feature broken" count
Honest assessment of "feature partially implemented but already has passing tests" count
Honest assessment of "real but only unit-covered" count

The coverage-summary.json schema:

{
  "generated_at_utc": "ISO timestamp",
  "scope": "Phase A1 + A2",
  "metrics": {
    "total_in_scope_features": 0,
    "real": 0,
    "partial": 0,
    "stub": 0,
    "missing": 0,
    "facade": 0
  },
  "test_counts": {
    "frontend_passed": 0,
    "rust_unit_passed": 0,
    "rust_e2e_passed": 0,
    "rust_transport_passed": 0,
    "total_passed": 0,
    "total_failed": 0
  },
  "shared_verification": {
    "command": "bash scripts/verify.sh",
    "status": "pass | fail",
    "native_smoke_artifact": "path"
  },
  "truthfulness": {
    "tests_pass_but_feature_broken": 0,
    "partial_features_with_passing_tests": 0,
    "real_but_only_unit_tested": 0,
    "features_honestly_working_e2e": 0
  },
  "changed_this_run": [],
  "remaining_delta": []
}

Execution Strategy

Incremental fast-path

When coverage-summary.json exists and no source files changed since last run:

Run static checks (fmt, clippy, tsc)
Run test suites
Compare test counts to previous run
Confirm previous run had no current-phase PARTIAL, FACADE, MISSING, or unjustified STUB
If all match and no such gaps remain: report clean pass with no changes needed
If counts differ or open gaps remain: investigate, fix, update artifacts

Full-cycle triggers

Run the full cycle (all 9 phases) when:

First run (no previous artifacts)
User explicitly requests full cycle
Source files changed since last run
Previous run had gaps that should now be checked
Test counts changed unexpectedly

Parallel execution

When subagents are available, parallelize:

Phase 2 can scan Rust and Frontend simultaneously
Phase 3 can audit backend and frontend in parallel
Phase 6 test creation for Rust and Frontend in parallel
Phase 7 can run frontend and backend test suites in parallel

Codex-Oriented Execution Rules

Default to action. Fix gaps, don't just report them.
Use subagents for review and independent confirmation when available.
Keep artifacts current. Refresh output files instead of letting stale reports stand.
Central verification is a first-class deliverable.
UX proof is a first-class deliverable.
Extend existing shared scripts rather than creating disconnected alternatives.
When fixing formatting or type errors, apply the fix and re-verify. Don't just report it.
If roadmap truth changes, update docs/PRODUCT_GOAL_PLAN.md and docs/PRODUCT_GOAL_EXECUTION_PLAN.md in the same run instead of leaving the docs behind the implementation truth.

Auxiliary Skills

Use when applicable:

macos-native-app-qa — packaged or native macOS app proof
playwright — browser automation
playwright-interactive — iterative browser or Electron debugging

Key Paths

docs/                              # Architecture and verification docs
src-tauri/src/                     # Rust backend source
src-tauri/tests/                   # Rust integration tests
src/                               # React frontend source
src/components/                    # UI components
src/stores/                        # Zustand stores
scripts/verify.sh                  # Full verification
output/arch-compliance/            # This skill's output directory