Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

$pwd:

spec-audit

Name: Spec Audit
Author: databricks-solutions

// Audit and improve spec coverage for a given spec. Use when (1) a spec has low or 0% requirement coverage, (2) tests exist but lack @req tags, (3) code behaviors have drifted from the spec's success criteria, (4) you need to identify unspecified behaviors in the codebase. Covers the full audit loop: analyze coverage -> tag existing tests -> identify spec gaps -> propose spec updates.

Ejecutar en Manus

$ git log --oneline --stat

stars:5

forks:7

updated:23 de marzo de 2026, 18:29

SKILL.md

readonly

name	spec-audit
description	Audit and improve spec coverage for a given spec. Use when (1) a spec has low or 0% requirement coverage, (2) tests exist but lack @req tags, (3) code behaviors have drifted from the spec's success criteria, (4) you need to identify unspecified behaviors in the codebase. Covers the full audit loop: analyze coverage -> tag existing tests -> identify spec gaps -> propose spec updates.
user_invocable	true

Spec Coverage Audit

When to Use

A spec shows low coverage in just spec-coverage
Tests are tagged @spec:X but not linked to requirements (@req)
You suspect the spec's success criteria don't match the implemented code
Issue #85 work items for filling spec gaps

Audit Workflow

Follow these steps in order. Do not over-research — each step builds on the previous one.

Step 1: Get the current state (< 2 minutes)

Run these in parallel:

# Current coverage numbers (JSON for programmatic use)
just spec-coverage --json | jq '.specs.SPEC_NAME'

# The spec's success criteria
# Read specs/SPEC_NAME.md — focus on the "Success Criteria" section

From the JSON output, note:

covered / total requirements
uncovered — requirements needing @req tags or new tests
unlinked_tests — tests tagged @spec but missing @req (these are the quick wins)

Step 2: Delegate to spec-tester agents

For a single spec: Spawn one spec-tester agent with the spec name and uncovered requirements list.

For multiple specs: Spawn spec-tester agents in parallel, one per spec:

Spawn these spec-tester agents in parallel:
- Agent 1: RUBRIC_SPEC, mode=tag-only, requirements: [list from Step 1]
- Agent 2: BUILD_AND_DEPLOY_SPEC, mode=tag-only, requirements: [list from Step 1]
- Agent 3: AUTHENTICATION_SPEC, mode=full, requirements: [list from Step 1]

For a large spec with many requirements: Split into requirement groups and spawn parallel agents:

Spawn these spec-tester agents in parallel for RUBRIC_SPEC:
- Agent 1: requirements in "Parsing & Serialization" category
- Agent 2: requirements in "CRUD Lifecycle" category
- Agent 3: requirements in "AI-Powered Generation" category

Each agent reads the spec, tags existing tests, writes new tests if needed, and verifies.

Step 3: Collect results and verify

After agents return, run the global check:

just spec-coverage    # verify overall improvement
just test-server      # full suite still passes

Step 4: Identify spec drift (only if asked)

Only do this if the user asks to find unspecified behaviors.

This requires reading implementation code — spawn parallel Explore agents per layer:

Spawn these explore agents in parallel:
- Agent 1: Read all router endpoints for this spec's domain, list business rules
- Agent 2: Read all service methods for this spec's domain, list edge cases and side effects
- Agent 3: Read all frontend components for this spec's domain, list user interactions

Compare findings to the spec's success criteria. Look for:

CRUD operations not in success criteria (create, edit, delete)
Phase/workflow preconditions (must be in phase X, must have Y first)
Side effects (background jobs, MLflow sync, cache invalidation)
Validation rules (input constraints, error responses)
AI/external service integration (generation, export, sync)

Step 5: Propose spec additions (protected operation)

Draft new success criteria grouped by category. Present to user before editing — /specs/ files require approval.

Tagging Reference

Framework	Format	Scope
pytest	`@pytest.mark.req("Exact text from success criteria")`	Per-test (decorator)
Playwright	`tag: ['@spec:X', '@req:Exact text from success criteria']`	Per-test (in test options)
Vitest	`// @req Exact text from success criteria`	Per-file only (analyzer limitation)

Critical: The @req text must match a - [ ] item from the spec exactly.

Vitest limitation: The analyzer caches one @req per vitest file. If a file covers multiple requirements, add @req markers to pytest or Playwright tests for the additional requirements instead.

Anti-Patterns

Don't spawn broad research agents before reading the coverage JSON. The JSON tells you exactly what's covered and uncovered.
Don't read all implementation code up front. Start with tagging existing tests (Step 2). Only read implementation code for spec drift (Step 4).
Don't write new tests before tagging existing ones. Unlinked tests are free coverage — just add markers.
Don't guess at @req text. Copy it exactly from the spec's - [ ] items.
Don't put multiple // @req comments in one vitest file expecting them all to be picked up. Only the first one works.
Don't run each spec sequentially when auditing multiple specs. Spawn parallel agents.

Example: Auditing RUBRIC_SPEC

# Step 1: Get state
just spec-coverage --json | jq '.specs.RUBRIC_SPEC'
# Shows: 0/10 covered, 51 unlinked tests

# Step 2: Spawn spec-tester agent
# Agent reads spec, tags 10 existing tests with @req markers

# Step 3: Verify
just test-spec RUBRIC_SPEC   # 30 passed
just spec-coverage           # RUBRIC_SPEC now 10/10 (100%)

# Step 4: Spec drift analysis (user asked)
# Spawned 3 explore agents in parallel -> found 15 unspecified behaviors

# Step 5: Proposed 15 new success criteria -> user approved -> 10/25 covered

Example: Auditing all low-coverage specs in parallel

# Step 1: Get state for all specs
just spec-coverage --json | jq '[.specs | to_entries[] | select(.value.coverage_pct < 50)] | .[].key'
# Returns: RUBRIC_SPEC, BUILD_AND_DEPLOY_SPEC, DESIGN_SYSTEM_SPEC, UI_COMPONENTS_SPEC

# Step 2: Spawn 4 spec-tester agents in parallel (one per spec, mode=tag-only)
# Each agent independently reads its spec, tags tests, verifies

# Step 3: Collect results, run full suite
just test-server && just spec-coverage

Reference

Spec files: specs/*.md
Coverage analyzer: tools/spec_coverage_analyzer.py
Coverage map: specs/SPEC_COVERAGE_MAP.md
Spec-tester agent: .claude/agents/spec-tester.md
Test tagging conventions: .claude/skills/verification-testing/SKILL.md

related-skills.json

mismo repositorio

brainstorming.md

from "databricks-solutions/project-0xfffff"

You MUST use this before any creative work — creating features, building components, adding functionality, or modifying behavior. Starts from existing specs rather than scratch. Use when the user asks to build, add, change, or design anything, even if it seems simple. Covers the full loop: find governing spec -> explore intent -> design within spec constraints -> transition to planning.

2026-03-235

verification-testing.md

from "databricks-solutions/project-0xfffff"

Code verification and testing for the Human Evaluation Workshop. Use when (1) running tests after code changes, (2) writing new unit tests (pytest/vitest), (3) writing E2E tests with Playwright/TestScenario, (4) debugging test failures, (5) understanding what to mock in E2E tests, (6) verifying a feature implementation. Covers the full test pyramid: unit tests -> integration tests -> E2E tests.

2026-03-235

writing-plans.md

from "databricks-solutions/project-0xfffff"

Use when you have a spec or requirements for a multi-step task, before touching code. Creates spec-linked implementation plans with TDD steps, exact file paths, and spec coverage tracking. Use this after brainstorming, when a user says 'plan this', 'how should we implement', or when you're about to start a multi-file feature. Covers the full loop: spec review -> file mapping -> task decomposition -> TDD steps -> coverage verification.

2026-03-235

mlflow-evaluation.md

from "databricks-solutions/project-0xfffff"

MLflow 3 GenAI evaluation for agent development. Use when (1) writing mlflow.genai.evaluate() code, (2) creating @scorer functions, (3) building evaluation datasets from traces, (4) using built-in scorers (Guidelines, Correctness, Safety, RetrievalGroundedness), (5) analyzing traces for latency/errors/architecture, (6) optimizing agent context/prompts/token usage, (7) debugging evaluation failures. Covers the full eval workflow: trace analysis -> dataset building -> scorer creation -> evaluation execution.

2026-01-165

package.json

"author": "databricks-solutions"

"repository": "databricks-solutions/project-0xfffff"

Abrir repositorio de GitHub Ver repositorios del creador

$ install --global

$ download --local

Ejecutar en Manus

$ useful --forSOC

Analistas de garantía de calidad de software y probadoresOcupaciones informáticas y matemáticas15-1253L4

name	spec-audit
description	Audit and improve spec coverage for a given spec. Use when (1) a spec has low or 0% requirement coverage, (2) tests exist but lack @req tags, (3) code behaviors have drifted from the spec's success criteria, (4) you need to identify unspecified behaviors in the codebase. Covers the full audit loop: analyze coverage -> tag existing tests -> identify spec gaps -> propose spec updates.
user_invocable	true

Spec Coverage Audit

When to Use

A spec shows low coverage in just spec-coverage
Tests are tagged @spec:X but not linked to requirements (@req)
You suspect the spec's success criteria don't match the implemented code
Issue #85 work items for filling spec gaps

Audit Workflow

Follow these steps in order. Do not over-research — each step builds on the previous one.

Step 1: Get the current state (< 2 minutes)

Run these in parallel:

# Current coverage numbers (JSON for programmatic use)
just spec-coverage --json | jq '.specs.SPEC_NAME'

# The spec's success criteria
# Read specs/SPEC_NAME.md — focus on the "Success Criteria" section

From the JSON output, note:

covered / total requirements
uncovered — requirements needing @req tags or new tests
unlinked_tests — tests tagged @spec but missing @req (these are the quick wins)

Step 2: Delegate to spec-tester agents

For a single spec: Spawn one spec-tester agent with the spec name and uncovered requirements list.

For multiple specs: Spawn spec-tester agents in parallel, one per spec:

Spawn these spec-tester agents in parallel:
- Agent 1: RUBRIC_SPEC, mode=tag-only, requirements: [list from Step 1]
- Agent 2: BUILD_AND_DEPLOY_SPEC, mode=tag-only, requirements: [list from Step 1]
- Agent 3: AUTHENTICATION_SPEC, mode=full, requirements: [list from Step 1]

For a large spec with many requirements: Split into requirement groups and spawn parallel agents:

Spawn these spec-tester agents in parallel for RUBRIC_SPEC:
- Agent 1: requirements in "Parsing & Serialization" category
- Agent 2: requirements in "CRUD Lifecycle" category
- Agent 3: requirements in "AI-Powered Generation" category

Each agent reads the spec, tags existing tests, writes new tests if needed, and verifies.

Step 3: Collect results and verify

After agents return, run the global check:

just spec-coverage    # verify overall improvement
just test-server      # full suite still passes

Step 4: Identify spec drift (only if asked)

Only do this if the user asks to find unspecified behaviors.

This requires reading implementation code — spawn parallel Explore agents per layer:

Spawn these explore agents in parallel:
- Agent 1: Read all router endpoints for this spec's domain, list business rules
- Agent 2: Read all service methods for this spec's domain, list edge cases and side effects
- Agent 3: Read all frontend components for this spec's domain, list user interactions

Compare findings to the spec's success criteria. Look for:

CRUD operations not in success criteria (create, edit, delete)
Phase/workflow preconditions (must be in phase X, must have Y first)
Side effects (background jobs, MLflow sync, cache invalidation)
Validation rules (input constraints, error responses)
AI/external service integration (generation, export, sync)

Step 5: Propose spec additions (protected operation)

Draft new success criteria grouped by category. Present to user before editing — /specs/ files require approval.

Tagging Reference

Framework	Format	Scope
pytest	`@pytest.mark.req("Exact text from success criteria")`	Per-test (decorator)
Playwright	`tag: ['@spec:X', '@req:Exact text from success criteria']`	Per-test (in test options)
Vitest	`// @req Exact text from success criteria`	Per-file only (analyzer limitation)

Critical: The @req text must match a - [ ] item from the spec exactly.

Anti-Patterns

Don't spawn broad research agents before reading the coverage JSON. The JSON tells you exactly what's covered and uncovered.
Don't read all implementation code up front. Start with tagging existing tests (Step 2). Only read implementation code for spec drift (Step 4).
Don't write new tests before tagging existing ones. Unlinked tests are free coverage — just add markers.
Don't guess at @req text. Copy it exactly from the spec's - [ ] items.
Don't put multiple // @req comments in one vitest file expecting them all to be picked up. Only the first one works.
Don't run each spec sequentially when auditing multiple specs. Spawn parallel agents.

Example: Auditing RUBRIC_SPEC

# Step 1: Get state
just spec-coverage --json | jq '.specs.RUBRIC_SPEC'
# Shows: 0/10 covered, 51 unlinked tests

# Step 2: Spawn spec-tester agent
# Agent reads spec, tags 10 existing tests with @req markers

# Step 3: Verify
just test-spec RUBRIC_SPEC   # 30 passed
just spec-coverage           # RUBRIC_SPEC now 10/10 (100%)

# Step 4: Spec drift analysis (user asked)
# Spawned 3 explore agents in parallel -> found 15 unspecified behaviors

# Step 5: Proposed 15 new success criteria -> user approved -> 10/25 covered

Example: Auditing all low-coverage specs in parallel

# Step 1: Get state for all specs
just spec-coverage --json | jq '[.specs | to_entries[] | select(.value.coverage_pct < 50)] | .[].key'
# Returns: RUBRIC_SPEC, BUILD_AND_DEPLOY_SPEC, DESIGN_SYSTEM_SPEC, UI_COMPONENTS_SPEC

# Step 2: Spawn 4 spec-tester agents in parallel (one per spec, mode=tag-only)
# Each agent independently reads its spec, tags tests, verifies

# Step 3: Collect results, run full suite
just test-server && just spec-coverage

Reference

Spec files: specs/*.md
Coverage analyzer: tools/spec_coverage_analyzer.py
Coverage map: specs/SPEC_COVERAGE_MAP.md
Spec-tester agent: .claude/agents/spec-tester.md
Test tagging conventions: .claude/skills/verification-testing/SKILL.md

spec-audit

Spec Coverage Audit

When to Use

Audit Workflow

Step 1: Get the current state (< 2 minutes)

Step 2: Delegate to spec-tester agents

Step 3: Collect results and verify

Step 4: Identify spec drift (only if asked)

Step 5: Propose spec additions (protected operation)

Tagging Reference

Anti-Patterns

Example: Auditing RUBRIC_SPEC

Example: Auditing all low-coverage specs in parallel

Reference

Más de este repositorio

Más de este repositorio

Spec Coverage Audit

When to Use

Audit Workflow

Step 1: Get the current state (< 2 minutes)

Step 2: Delegate to spec-tester agents

Step 3: Collect results and verify

Step 4: Identify spec drift (only if asked)

Step 5: Propose spec additions (protected operation)

Tagging Reference

Anti-Patterns

Example: Auditing RUBRIC_SPEC

Example: Auditing all low-coverage specs in parallel

Reference