一键在 Manus 中运行任何 Skill

$pwd:

accurate-testing

Name: Accurate Testing
Author: theexperiencecompany

// Write test cases that catch real bugs in production code instead of producing false confidence. Use when writing, reviewing, or planning tests for any codebase. Triggers on: "write tests", "add test coverage", "test this function", "create unit tests", "integration tests", "fix flaky tests", "improve test coverage", "review test quality", "are these tests good", "test plan", or any task involving pytest, vitest, jest, or other test frameworks. Prevents common AI testing pitfalls: over-mocking, testing frameworks instead of production code, fake implementations that bypass real logic, and assertion-free tests.

在 Manus 中运行

$ git log --oneline --stat

stars:2

forks:0

updated:2026年3月31日 07:54

文件资源管理器

4 个文件

SKILL.md

readonly

related-skills.json

同仓库

motion-design-videos.md

from "theexperiencecompany/skills"

Produce an Apple-keynote-quality motion-design video in HyperFrames (HTML + GSAP compositions) — kinetic typography plus real product UI, scored to a real music track, rendered at 4K. Use when asked to make a motion-design video, an Apple-style launch/product film, a feature-announcement video, a hype/promo reel or sizzle, ESPECIALLY in HyperFrames, or when porting motion-studio/Remotion components into a HyperFrames video. The engine is HyperFrames — `npx hyperframes` lint/inspect/render, one paused GSAP timeline on window.__timelines, sub-compositions, a reusable component library, and a 4K scale-stage. Covers the full pipeline — discovery, a binding creative brief, a music beat grid, an agent team (scriptwriter, reviewer, component porter, video builder, UX and craft auditors), faithful component porting, the Apple aesthetic and exact palette, sourcing a real royalty-free music track beat-matched to the edit, 4K export, frame-by-frame verification, and the hard-won pitfalls that cost real iterations.

2026-05-222

creating-design-systems.md

from "theexperiencecompany/skills"

Use when reverse-engineering a codebase's implicit design system, creating a DESIGN.md style guide, documenting design tokens, or establishing visual language standards. Also use when asked to audit UI consistency, rationalize ad-hoc styling into a system, or prepare a design reference that enables one-shotting new components.

2026-03-262

landing-page.md

from "theexperiencecompany/skills"

1-shot high-quality landing pages and demos in GAIA's design system. Use when building any new marketing or landing page, feature showcase, persona page, or integration page for the GAIA web app.

2026-03-262

gaia-video-animations.md

from "theexperiencecompany/skills"

Create high-quality Remotion video animations for GAIA product commercials. Use when building, modifying, or creating new scenes for the GAIA promotional video (apps/video), creating Apple-style motion design commercials in Remotion, or when working on any video animation in the GAIA monorepo. Covers scene architecture, animation patterns, typography, transitions, sound design, and aesthetic rules derived from 72 production conversations. Also use when the user asks to create a new video, add scenes, fix animation timing, or improve visual quality of Remotion compositions.

2026-03-152

gaia-ui.md

from "theexperiencecompany/skills"

Design language and visual conventions for the GAIA web app (apps/web). Use when building new components, pages, tool call sections, chat bubbles, or any UI element in GAIA. Covers the design philosophy, color system, typography, spacing, icons, animations, and all patterns needed to make anything look native to GAIA.

2026-03-012

package.json

"author": "theexperiencecompany"

"repository": "theexperiencecompany/skills"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

软件质量保证分析师与测试员计算机与数学类职业15-1253L4

name

accurate-testing

description

Write test cases that catch real bugs in production code instead of producing false confidence. Use when writing, reviewing, or planning tests for any codebase. Triggers on: "write tests", "add test coverage", "test this function", "create unit tests", "integration tests", "fix flaky tests", "improve test coverage", "review test quality", "are these tests good", "test plan", or any task involving pytest, vitest, jest, or other test frameworks. Prevents common AI testing pitfalls: over-mocking, testing frameworks instead of production code, fake implementations that bypass real logic, and assertion-free tests.

Accurate Testing

The Deletion Test (Golden Rule)

Before considering any test complete, apply this mental check:

If the production code this test targets were deleted entirely, would this test still pass?

If yes, the test is worthless. It tests framework plumbing, not production logic.

Core Workflow

Identify the production code under test — find the exact function, class, or endpoint
Read it — understand its real logic, branches, edge cases, and dependencies
Apply the import rule — the test file MUST import from production code
Choose the right mock boundary — mock I/O at the edges, never mock the thing being tested
Write assertions against real behavior — assert on return values, state changes, side effects that matter
Run the deletion test mentally — would deleting the production function break this test?

The Five Laws of Accurate Tests

Law 1: Import Production Code

Every test file must import the actual production function/class it claims to test.

# WRONG — tests LangGraph, not your app
from langgraph.graph import StateGraph
graph = StateGraph(MessagesState)
graph.add_node("echo", lambda s: {"messages": [AIMessage(content="Echo")]})

# RIGHT — tests your app
from app.agents.core.graph_builder.build_graph import build_comms_agent
graph = build_comms_agent(checkpointer=MemorySaver())

Law 2: Mock at the Boundary, Not the Core

Mock external I/O (network, database, filesystem). Never mock the logic under test.

# WRONG — mocks the function being tested, tests nothing
with patch("app.services.chat_service.run_chat_stream") as mock:
    mock.return_value = "response"
    result = run_chat_stream(msg)  # just calls the mock

# RIGHT — mocks the dependency, tests the real function
with patch("app.services.chat_service.llm_client.invoke") as mock_llm:
    mock_llm.return_value = AIMessage(content="hello")
    result = run_chat_stream(msg)  # runs real logic, fake LLM

Law 2b: Patch Module Singletons, Not Individual Functions

When production code uses a module-level singleton (a shared client, cache, or connection object), patch the singleton's attribute directly. This ensures all production code that touches the singleton — including code several layers deep — uses the real test resource without any function-level patching.

# Production: redis_cache = RedisCache()  (module singleton)
# StreamManager uses redis_cache.redis internally

# WRONG — patches one function, misses all others that use redis_cache
with patch("app.core.stream_manager.StreamManager.publish_chunk") as mock:
    ...  # other methods still use the broken/missing redis_cache.redis

# RIGHT — patch the singleton attribute; all production code sees real Redis
from app.db.redis import redis_cache

@pytest.fixture
async def real_redis(monkeypatch):
    client = Redis.from_url("redis://localhost:6379", decode_responses=True)
    await client.ping()
    monkeypatch.setattr(redis_cache, "redis", client)   # one patch, everything works
    yield client
    await client.flushdb()
    await client.aclose()

async def test_stream_publishes(real_redis):
    await StreamManager.start_stream("s1", "conv1", "user1")
    await StreamManager.publish_chunk("s1", "data: hello\n\n")
    chunks = []
    async for chunk in StreamManager.subscribe_stream("s1"):
        chunks.append(chunk)
        break
    assert chunks[0] == "data: hello\n\n"

Law 3: Assert on Production Behavior

Assert on what the production code actually does — return values, state mutations, raised exceptions, emitted events.

# WRONG — asserts mock was called (tests your test setup)
mock_service.process.assert_called_once_with(data)

# RIGHT — asserts the actual outcome
result = process_email(raw_email)
assert result.subject == "Re: Meeting"
assert result.is_read is False
assert len(result.attachments) == 2

Law 4: Cover Real Branches

Read the production code. Find the if/elif/else, try/except, and early returns. Write a test for each path.

# Production code has: if user.is_premium: ... else: ...
# Test BOTH paths
def test_premium_user_gets_extended_features(): ...
def test_free_user_gets_basic_features(): ...

Law 5: Test Error Paths, Not Just Happy Paths

Production bugs cluster in error handling. Test what happens when dependencies fail.

def test_handles_api_timeout():
    with patch("app.tools.gmail.client.send") as mock:
        mock.side_effect = httpx.TimeoutException("timeout")
        result = send_email(to="x@y.com", body="hi")
        assert result.error == "Failed to send: timeout"

Anti-Pattern Detection

When writing or reviewing tests, check for these red flags. For detailed examples and fixes, see references/anti-patterns.md.

Red Flag	What It Means
Test file has zero imports from `app/` or `src/`	Tests framework, not production code
More `@patch` decorators than assertions	Over-mocking — testing your mock setup
Test builds its own graph/pipeline from scratch	Tests the framework's graph builder, not your graph
Assertions only check `mock.called` or `mock.call_count`	Proves nothing about production behavior
Test defines a fake implementation of the thing being tested	Circular — testing your fake, not production code
`# mimicking`, `# simplified version of` in comments	Admission that production code is not under test
Test manually reimplements what a production function does	Duplication — if you delete the function, test still passes
All tests pass when production code is broken	The entire suite is false confidence

Mock Hierarchy (What to Mock Where)

Test Type	Mock	Don't Mock
Unit	DB clients, HTTP clients, message queues, filesystem	The function under test, its direct logic
Integration	LLM API calls, external SaaS APIs (Composio, Stripe)	Your service layer, your DB queries, your routing
E2E	LLM (use fake model), external APIs (use recorded responses)	Your entire pipeline — graph, routing, services, DB

Language-Specific Guidance

Python (pytest): See references/pytest-patterns.md for fixture design, parametrize patterns, and conftest hierarchy
TypeScript (vitest/jest): See references/vitest-patterns.md for module mocking, type-safe mocks, and async patterns

Pre-Commit Checklist

Before finalizing any test:

Test imports the production function/class directly
Removing the production code would break this test
Assertions check return values or state, not just mock calls
Each branch in production code has a corresponding test case
Error/exception paths are tested
Mock count is proportional to external dependencies, not internal logic
Test name describes the behavior being verified, not the implementation

accurate-testing

同仓库更多 Skills

同仓库更多 Skills

Accurate Testing

The Deletion Test (Golden Rule)

Core Workflow

The Five Laws of Accurate Tests

Law 1: Import Production Code

Law 2: Mock at the Boundary, Not the Core

Law 3: Assert on Production Behavior

Law 4: Cover Real Branches

Law 5: Test Error Paths, Not Just Happy Paths

Anti-Pattern Detection

Mock Hierarchy (What to Mock Where)

Language-Specific Guidance

Pre-Commit Checklist

Accurate Testing

The Deletion Test (Golden Rule)

Core Workflow

The Five Laws of Accurate Tests

Law 1: Import Production Code

Law 2: Mock at the Boundary, Not the Core

Law 3: Assert on Production Behavior

Law 4: Cover Real Branches

Law 5: Test Error Paths, Not Just Happy Paths

Anti-Pattern Detection

Mock Hierarchy (What to Mock Where)

Language-Specific Guidance

Pre-Commit Checklist