name	run-tests
description	Comprehensive pytest testing and debugging framework. Use when running tests, debugging failures, fixing broken tests, or investigating test errors. Includes systematic investigation workflow with external AI tool consultation and verification strategies.

Pytest Testing and Debugging Skill

Overview

This skill provides a systematic approach to running tests and debugging failures using pytest. The core workflow integrates investigation, external tool consultation, and verification to efficiently resolve test failures.

Key capabilities:

Run tests with presets for common scenarios (debug, quick, coverage)
Systematic investigation and hypothesis formation
External AI tool consultation (gemini, codex, cursor-agent) when tests fail
Multi-agent analysis for complex issues
Test discovery and structure analysis

⚠️ Long-Running Operations

This skill may run operations that take up to 5 minutes. Be patient and wait for completion.

CRITICAL: Avoid BashOutput Spam

ALWAYS use foreground execution with 5-minute timeout: Bash(command="...", timeout=300000)
WAIT for the command to complete - this may take the full 5 minutes
NEVER use run_in_background=True for test suites, builds, or analysis
If you must use background (rare), wait at least 60 seconds between BashOutput checks
Maximum 3 BashOutput calls per background process - then kill it or let it finish

Why?

Polling BashOutput repeatedly creates spam and degrades user experience. Long operations should run in foreground with appropriate timeout, not in background with frequent polling.

Example (CORRECT):

# Test suite that might take 5 minutes (timeout in milliseconds)
result = Bash(command="pytest src/", timeout=300000)  # Wait up to 5 minutes
# The command will block here until completion - this is correct behavior

Example (WRONG):

# Don't use background + polling
bash_id = Bash(command="pytest", run_in_background=True)
output = BashOutput(bash_id)  # Creates spam!

Core Workflow

5-Phase Process:

Run Tests - Execute tests with appropriate flags
Investigate - Analyze failures, form hypothesis
Gather Context - Optionally use code documentation for faster understanding
Consult - Get external tool insights (mandatory for failures if tools available)
Fix & Verify - Implement changes and confirm no regressions

Key principles:

Investigation-first - Always analyze before consulting
Hypothesis-driven - Form theories, then validate
Mandatory consultation for failures - If tests fail and tools exist, consult them
Skip when passing - Tests pass? Done. No consultation needed.

Quick decision guide:

✅ Tests pass? → Done
❌ Simple fix (typo/obvious)? → Fix → Verify
❌ Complex/unclear? → Investigate → Consult → Fix → Verify

Phase 1: Run Tests

Discover Test Structure (Optional)

If unfamiliar with test organization:

# Quick summary
sdd test discover --summary

# Directory tree
sdd test discover --tree

Run Tests

# Quick run (stop on first failure)
sdd test run --quick

# Debug mode (verbose with locals and prints)
sdd test run --preset-debug

# Run specific test
sdd test run tests/test_module.py::test_function

# Coverage report
sdd test run --coverage

# List all presets
sdd test run --list

Or use pytest directly:

pytest -v                # Verbose
pytest -vv -l -s        # Very verbose, show locals, show prints
pytest -x                # Stop on first failure
pytest -k "test_user"   # Run tests matching pattern

Capture Output

For large test suites with many failures:

# Save output to timestamped file
sdd test run --preset-debug | tee /tmp/test-run-$(date +%Y%m%d-%H%M%S).log

Phase 2: Investigate Failures

Categorize the Failure

Assertion - Expected vs actual mismatch
Exception - Runtime errors (AttributeError, KeyError, etc.)
Import - Missing dependencies or module issues
Fixture - Fixture or configuration issues
Timeout - Performance or hanging issues
Flaky - Non-deterministic failures

Extract Key Information

For each failure:

Test file and function name
Line number where failure occurred
Error type and message
Full stack trace
Relevant code context

Examine the Code

Read the failing test
Read the implementation being tested
Understand what the test verifies
Identify expected vs actual behavior
Form your hypothesis - What's causing the failure?

Phase 3: Gather Code Context (Optional)

When available: If codebase documentation exists (generated by sdd doc generate), use it for faster investigation.

Check availability:

sdd doc stats

Useful commands when debugging:

# Search for functions or concepts
sdd doc search "authentication"

# Show function definition
sdd doc show-function AuthService.login

# Find dependencies
sdd doc list-dependencies src/services/authService.ts

# Find what depends on a file (impact analysis)
sdd doc dependencies --reverse src/auth.py

Benefits:

Faster context gathering
Better root cause analysis
Discover similar patterns
Impact analysis

If not available: Continue with standard file exploration. Run sdd doc generate to create documentation for future use.

Phase 4: Consult External Tools

CRITICAL: This is mandatory for test failures when external tools exist.

Check Tool Availability

sdd test check-tools

Decision:

Any tool available AND tests failed → Consult (mandatory)
No tools available → Skip to Phase 5
Tests passed → Skip to Phase 5 (no consultation needed)

Consult Tools

All external tools operate in read-only mode. They analyze and suggest; YOU implement all fixes.

# Auto-route based on failure type
sdd test consult assertion --error "Full error message" --hypothesis "Your theory about the cause"

# Include code for context
sdd test consult exception --error "AttributeError: ..." --hypothesis "Missing return" --test-code tests/test_file.py --impl-code src/module.py

# Show routing matrix
sdd test consult --list-routing

# Manual tool selection
sdd test consult --tool gemini --prompt "Custom question..."

Tool Selection Guide

Tool	Best For	Example Use
Gemini	Hypothesis validation, framework explanations, strategic guidance	"Why is this fixture not found?"
Codex	Code-level review, specific fix suggestions	"Review this code and suggest fixes"
Cursor	Repo-wide discovery, finding patterns	"Find all call sites"

When to Use Multiple Tools

Use multi-agent consultation for:

High-stakes fixes affecting critical functionality
Complex issues with unclear root cause
Need validation from multiple perspectives
Uncertain between multiple approaches

# Auto-selects configured consensus agents
sdd test consult assertion --error "..." --hypothesis "..." --multi-agent

# Override which agents participate (comma-separated list)
sdd test consult exception --error "..." --hypothesis "..." --multi-agent --agents gemini,codex

Effective Prompting

Share your hypothesis - Ask "is my theory correct?" not "what's wrong?"
Provide complete context - Error messages, code, stack traces
Include what you've tried - Show your investigation work
Ask for explanations - Understand "why", not just "how to fix"
Be specific - State exactly what you need

Phase 5: Fix & Verify

Synthesize Findings

Combine insights from:

Your investigation and hypothesis
External tool recommendations
Any additional research

Implement Fix

# Make targeted changes using Edit tool
# Example: Add missing return statement

Verify

# Run the specific fixed test
sdd test run tests/test_module.py::test_function

# If passing, run full suite
sdd test run

# Verify no regressions
pytest tests/ -v

Document

Add comments explaining:

What was wrong
Why the fix works
Any assumptions or limitations

CLI Reference

sdd test check-tools

Check availability of external tools and get routing suggestions.

# Basic check
sdd test check-tools

# Get routing for specific failure type
sdd test check-tools --route assertion
sdd test check-tools --route fixture

sdd test run

Smart pytest runner with presets for common scenarios.

# List all presets
sdd test run --list

# Presets
sdd test run --quick      # Stop on first failure
sdd test run --preset-debug      # Verbose + locals + prints
sdd test run --coverage   # Coverage report
sdd test run --fast       # Skip slow tests
sdd test run --parallel   # Run in parallel

# Run specific test
sdd test run tests/test_file.py::test_name

sdd test consult

External tool consultation with auto-routing.

# Auto-route based on failure type
sdd test consult {assertion|exception|fixture|import|timeout|flaky} --error "..." --hypothesis "..."

# Include code
sdd test consult exception --error "..." --hypothesis "..." --test-code tests/test.py --impl-code src/module.py

# Multi-agent mode
sdd test consult assertion --error "..." --hypothesis "..." --multi-agent

# Manual tool selection
sdd test consult --tool {gemini|codex|cursor} --prompt "..."

# Show routing matrix
sdd test consult --list-routing

# Dry run
sdd test consult fixture --error "..." --hypothesis "..." --dry-run

sdd test discover

Test structure analyzer and discovery.

# Quick summary
sdd test discover --summary

# Directory tree
sdd test discover --tree

# All fixtures
sdd test discover --fixtures

# All markers
sdd test discover --markers

# Detailed analysis
sdd test discover --detailed

# Analyze specific directory
sdd test discover tests/unit --summary

Global Options

Available on all commands:

--no-color - Disable colored output
--verbose, -v - Show detailed output
--quiet, -q - Minimal output (errors only)

Common Patterns

Multiple Failing Tests

Group by error type
Fix one group at a time
Look for common root causes
Consider whether tests need updating vs code needs fixing

Flaky Tests

# Run test multiple times
pytest tests/test_flaky.py --count=10

# Run with random order
pytest --random-order

Fixture Issues

# Show fixture setup and teardown
pytest --setup-show tests/test_module.py

# List available fixtures
pytest --fixtures

Common fixture problems:

Fixture not in conftest.py or test file
Fixture name doesn't match exactly
conftest.py in wrong directory
Incorrect fixture scope

Integration Test Failures

Check in order:

External dependencies
Test environment setup
Database state
Configuration
Network connectivity

Tool Routing Matrix

Quick reference for which tool to use based on failure type:

Failure Type	Primary Tool	Secondary (if needed)	Why
Assertion mismatch	Codex	Gemini	Code-level bug analysis
Exceptions	Codex	Gemini	Precise code review
Import/packaging	Gemini	Cursor	Framework expertise
Fixture issues	Gemini	Cursor	Pytest scoping knowledge
Timeout/performance	Gemini + Cursor	-	Strategy + pattern discovery
Flaky tests	Gemini + Cursor	-	Diagnosis + state dependencies
Multi-file issues	Cursor	Gemini	Discovery + synthesis
Unclear errors	Gemini	Web search	Explanation first

Query type routing:

"Why is this happening?" → Gemini
"Is this code wrong?" → Codex
"Where else does this occur?" → Cursor
"What should I do?" → Gemini + Codex

Special Scenarios

Verification Runs (Confirming Refactors)

When running tests to verify refactoring:

# Run full suite
sdd test run

# If all pass: Done! No consultation needed.
# If tests fail: Follow standard debugging workflow

Key point: Passing verification runs require no consultation. Only investigate failures.

When Tools Disagree

If two tools give different recommendations:

Compare reasoning - Which explanation is more thorough?
Check scope - Which considers broader impact?
Apply critical thinking - Which aligns with your investigation?
Try simplest first - Implement less invasive fix first
Document uncertainty - Note in code comments

When to Escalate to Additional Tools

Use additional tools when:

Answer is unclear or vague
Answer contradicts your analysis
Answer raises new questions
Partial answer (addresses some aspects only)
High-stakes scenario (critical functionality)

Timeout and Retry Behavior

Consultation timeouts:

Default: 90 seconds
Configurable via .claude/ai_config.yaml (run-tests.consultation.timeout_seconds)

When tools time out:

Simplify prompt (remove large code blocks)
Try different tool from routing matrix
Check if tool process is hung: ps aux | grep <tool>
Increase timeout in config if needed

Tool Availability Fallbacks

Recommended	If Unavailable	How to Compensate
Gemini	Codex or Cursor	Ask "why" with extra context; use web search
Codex	Gemini	Ask for very specific code examples
Cursor	Manual Grep + Gemini	Use Grep to find patterns, Gemini to analyze

Advanced Topics

Multi-Agent Analysis

Multi-agent mode consults two agents in parallel and synthesizes their insights:

sdd test consult fixture --error "..." --hypothesis "..." --multi-agent

Output includes:

Consensus points (where agents agree)
Unique insights from each agent
Synthesis combining both analyses
High-confidence recommendations

Benefits:

Higher confidence through multiple perspectives
Better coverage (each agent contributes unique insights)
Risk reduction (divergent views expose alternatives)

Using pytest-pdb for Debugging

# Drop into debugger on failure
pytest --pdb

# Drop into debugger on first failure
pytest -x --pdb

Custom Markers for Test Organization

# conftest.py
def pytest_configure(config):
    config.addinivalue_line("markers", "slow: marks tests as slow")
    config.addinivalue_line("markers", "integration: marks integration tests")
    config.addinivalue_line("markers", "unit: marks unit tests")

# Usage
@pytest.mark.slow
def test_complex_calculation():
    pass

Mocking External Services

from unittest.mock import Mock, patch

def test_api_call():
    with patch('requests.get') as mock_get:
        mock_get.return_value.json.return_value = {"status": "ok"}
        result = fetch_data()
        assert result["status"] == "ok"
        mock_get.assert_called_once()

Troubleshooting

"Fixture not found"

Check fixture is defined in conftest.py or same file
Verify fixture name matches exactly
Check fixture scope is appropriate
Ensure conftest.py is in correct directory

"Import error"

Check PYTHONPATH includes src directory
Verify __init__.py files exist
Check for circular imports
Verify package installed in development mode

"Tests pass locally but fail in CI"

Check for hardcoded paths
Verify all dependencies in requirements
Check for timezone issues
Look for race conditions
Check file system differences

"Test is too slow"

Use fixtures with appropriate scope
Mock external services
Use in-memory databases
Parallelize: sdd test run --parallel

Best Practices

Running Tests

Start with verbose mode (-v) for better visibility
Use -x to stop on first failure when debugging
Run specific tests to iterate faster
Use markers to organize test runs

Debugging Strategy

Read error messages carefully
Check last line of stack trace first
Use -l flag to see local variables
Add temporary print statements for quick debugging

Consultation Workflow

For test failures:

Do initial investigation first
Check tool availability: sdd test check-tools
Consult available tools (mandatory if tests failed)
Share your hypothesis - don't ask blind questions
Synthesize insights from tools + your analysis
YOU implement using Edit/Write tools
Test thoroughly

Skip consultation when:

Tests all passed
Verification/smoke tests succeeded
Post-fix confirmation (tests already passed once)
No tools available

Success Criteria

A test debugging session is successful when:

✓ All tests pass
✓ No new tests are broken
✓ Root cause is understood
✓ Fix is documented
✓ Code is cleaner/clearer than before (when appropriate)

name	run-tests
description	Comprehensive pytest testing and debugging framework. Use when running tests, debugging failures, fixing broken tests, or investigating test errors. Includes systematic investigation workflow with external AI tool consultation and verification strategies.

Pytest Testing and Debugging Skill

Overview

Key capabilities:

Run tests with presets for common scenarios (debug, quick, coverage)
Systematic investigation and hypothesis formation
External AI tool consultation (gemini, codex, cursor-agent) when tests fail
Multi-agent analysis for complex issues
Test discovery and structure analysis

⚠️ Long-Running Operations

This skill may run operations that take up to 5 minutes. Be patient and wait for completion.

CRITICAL: Avoid BashOutput Spam

ALWAYS use foreground execution with 5-minute timeout: Bash(command="...", timeout=300000)
WAIT for the command to complete - this may take the full 5 minutes
NEVER use run_in_background=True for test suites, builds, or analysis
If you must use background (rare), wait at least 60 seconds between BashOutput checks
Maximum 3 BashOutput calls per background process - then kill it or let it finish

Why?

Polling BashOutput repeatedly creates spam and degrades user experience. Long operations should run in foreground with appropriate timeout, not in background with frequent polling.

Example (CORRECT):

# Test suite that might take 5 minutes (timeout in milliseconds)
result = Bash(command="pytest src/", timeout=300000)  # Wait up to 5 minutes
# The command will block here until completion - this is correct behavior

Example (WRONG):

# Don't use background + polling
bash_id = Bash(command="pytest", run_in_background=True)
output = BashOutput(bash_id)  # Creates spam!

Core Workflow

5-Phase Process:

Run Tests - Execute tests with appropriate flags
Investigate - Analyze failures, form hypothesis
Gather Context - Optionally use code documentation for faster understanding
Consult - Get external tool insights (mandatory for failures if tools available)
Fix & Verify - Implement changes and confirm no regressions

Key principles:

Investigation-first - Always analyze before consulting
Hypothesis-driven - Form theories, then validate
Mandatory consultation for failures - If tests fail and tools exist, consult them
Skip when passing - Tests pass? Done. No consultation needed.

Quick decision guide:

✅ Tests pass? → Done
❌ Simple fix (typo/obvious)? → Fix → Verify
❌ Complex/unclear? → Investigate → Consult → Fix → Verify

Phase 1: Run Tests

Discover Test Structure (Optional)

If unfamiliar with test organization:

# Quick summary
sdd test discover --summary

# Directory tree
sdd test discover --tree

Run Tests

# Quick run (stop on first failure)
sdd test run --quick

# Debug mode (verbose with locals and prints)
sdd test run --preset-debug

# Run specific test
sdd test run tests/test_module.py::test_function

# Coverage report
sdd test run --coverage

# List all presets
sdd test run --list

Or use pytest directly:

pytest -v                # Verbose
pytest -vv -l -s        # Very verbose, show locals, show prints
pytest -x                # Stop on first failure
pytest -k "test_user"   # Run tests matching pattern

Capture Output

For large test suites with many failures:

# Save output to timestamped file
sdd test run --preset-debug | tee /tmp/test-run-$(date +%Y%m%d-%H%M%S).log

Phase 2: Investigate Failures

Categorize the Failure

Assertion - Expected vs actual mismatch
Exception - Runtime errors (AttributeError, KeyError, etc.)
Import - Missing dependencies or module issues
Fixture - Fixture or configuration issues
Timeout - Performance or hanging issues
Flaky - Non-deterministic failures

Extract Key Information

For each failure:

Test file and function name
Line number where failure occurred
Error type and message
Full stack trace
Relevant code context

Examine the Code

Read the failing test
Read the implementation being tested
Understand what the test verifies
Identify expected vs actual behavior
Form your hypothesis - What's causing the failure?

Phase 3: Gather Code Context (Optional)

When available: If codebase documentation exists (generated by sdd doc generate), use it for faster investigation.

Check availability:

sdd doc stats

Useful commands when debugging:

# Search for functions or concepts
sdd doc search "authentication"

# Show function definition
sdd doc show-function AuthService.login

# Find dependencies
sdd doc list-dependencies src/services/authService.ts

# Find what depends on a file (impact analysis)
sdd doc dependencies --reverse src/auth.py

Benefits:

Faster context gathering
Better root cause analysis
Discover similar patterns
Impact analysis

If not available: Continue with standard file exploration. Run sdd doc generate to create documentation for future use.

Phase 4: Consult External Tools

CRITICAL: This is mandatory for test failures when external tools exist.

Check Tool Availability

sdd test check-tools

Decision:

Any tool available AND tests failed → Consult (mandatory)
No tools available → Skip to Phase 5
Tests passed → Skip to Phase 5 (no consultation needed)

Consult Tools

All external tools operate in read-only mode. They analyze and suggest; YOU implement all fixes.

# Auto-route based on failure type
sdd test consult assertion --error "Full error message" --hypothesis "Your theory about the cause"

# Include code for context
sdd test consult exception --error "AttributeError: ..." --hypothesis "Missing return" --test-code tests/test_file.py --impl-code src/module.py

# Show routing matrix
sdd test consult --list-routing

# Manual tool selection
sdd test consult --tool gemini --prompt "Custom question..."

Tool Selection Guide

Tool	Best For	Example Use
Gemini	Hypothesis validation, framework explanations, strategic guidance	"Why is this fixture not found?"
Codex	Code-level review, specific fix suggestions	"Review this code and suggest fixes"
Cursor	Repo-wide discovery, finding patterns	"Find all call sites"

When to Use Multiple Tools

Use multi-agent consultation for:

High-stakes fixes affecting critical functionality
Complex issues with unclear root cause
Need validation from multiple perspectives
Uncertain between multiple approaches

# Auto-selects configured consensus agents
sdd test consult assertion --error "..." --hypothesis "..." --multi-agent

# Override which agents participate (comma-separated list)
sdd test consult exception --error "..." --hypothesis "..." --multi-agent --agents gemini,codex

Effective Prompting

Share your hypothesis - Ask "is my theory correct?" not "what's wrong?"
Provide complete context - Error messages, code, stack traces
Include what you've tried - Show your investigation work
Ask for explanations - Understand "why", not just "how to fix"
Be specific - State exactly what you need

Phase 5: Fix & Verify

Synthesize Findings

Combine insights from:

Your investigation and hypothesis
External tool recommendations
Any additional research

Implement Fix

# Make targeted changes using Edit tool
# Example: Add missing return statement

Verify

# Run the specific fixed test
sdd test run tests/test_module.py::test_function

# If passing, run full suite
sdd test run

# Verify no regressions
pytest tests/ -v

Document

Add comments explaining:

What was wrong
Why the fix works
Any assumptions or limitations

CLI Reference

sdd test check-tools

Check availability of external tools and get routing suggestions.

# Basic check
sdd test check-tools

# Get routing for specific failure type
sdd test check-tools --route assertion
sdd test check-tools --route fixture

sdd test run

Smart pytest runner with presets for common scenarios.

# List all presets
sdd test run --list

# Presets
sdd test run --quick      # Stop on first failure
sdd test run --preset-debug      # Verbose + locals + prints
sdd test run --coverage   # Coverage report
sdd test run --fast       # Skip slow tests
sdd test run --parallel   # Run in parallel

# Run specific test
sdd test run tests/test_file.py::test_name

sdd test consult

External tool consultation with auto-routing.

# Auto-route based on failure type
sdd test consult {assertion|exception|fixture|import|timeout|flaky} --error "..." --hypothesis "..."

# Include code
sdd test consult exception --error "..." --hypothesis "..." --test-code tests/test.py --impl-code src/module.py

# Multi-agent mode
sdd test consult assertion --error "..." --hypothesis "..." --multi-agent

# Manual tool selection
sdd test consult --tool {gemini|codex|cursor} --prompt "..."

# Show routing matrix
sdd test consult --list-routing

# Dry run
sdd test consult fixture --error "..." --hypothesis "..." --dry-run

sdd test discover

Test structure analyzer and discovery.

# Quick summary
sdd test discover --summary

# Directory tree
sdd test discover --tree

# All fixtures
sdd test discover --fixtures

# All markers
sdd test discover --markers

# Detailed analysis
sdd test discover --detailed

# Analyze specific directory
sdd test discover tests/unit --summary

Global Options

Available on all commands:

--no-color - Disable colored output
--verbose, -v - Show detailed output
--quiet, -q - Minimal output (errors only)

Common Patterns

Multiple Failing Tests

Group by error type
Fix one group at a time
Look for common root causes
Consider whether tests need updating vs code needs fixing

Flaky Tests

# Run test multiple times
pytest tests/test_flaky.py --count=10

# Run with random order
pytest --random-order

Fixture Issues

# Show fixture setup and teardown
pytest --setup-show tests/test_module.py

# List available fixtures
pytest --fixtures

Common fixture problems:

Fixture not in conftest.py or test file
Fixture name doesn't match exactly
conftest.py in wrong directory
Incorrect fixture scope

Integration Test Failures

Check in order:

External dependencies
Test environment setup
Database state
Configuration
Network connectivity

Tool Routing Matrix

Quick reference for which tool to use based on failure type:

Failure Type	Primary Tool	Secondary (if needed)	Why
Assertion mismatch	Codex	Gemini	Code-level bug analysis
Exceptions	Codex	Gemini	Precise code review
Import/packaging	Gemini	Cursor	Framework expertise
Fixture issues	Gemini	Cursor	Pytest scoping knowledge
Timeout/performance	Gemini + Cursor	-	Strategy + pattern discovery
Flaky tests	Gemini + Cursor	-	Diagnosis + state dependencies
Multi-file issues	Cursor	Gemini	Discovery + synthesis
Unclear errors	Gemini	Web search	Explanation first

Query type routing:

"Why is this happening?" → Gemini
"Is this code wrong?" → Codex
"Where else does this occur?" → Cursor
"What should I do?" → Gemini + Codex

Special Scenarios

Verification Runs (Confirming Refactors)

When running tests to verify refactoring:

# Run full suite
sdd test run

# If all pass: Done! No consultation needed.
# If tests fail: Follow standard debugging workflow

Key point: Passing verification runs require no consultation. Only investigate failures.

When Tools Disagree

If two tools give different recommendations:

Compare reasoning - Which explanation is more thorough?
Check scope - Which considers broader impact?
Apply critical thinking - Which aligns with your investigation?
Try simplest first - Implement less invasive fix first
Document uncertainty - Note in code comments

When to Escalate to Additional Tools

Use additional tools when:

Answer is unclear or vague
Answer contradicts your analysis
Answer raises new questions
Partial answer (addresses some aspects only)
High-stakes scenario (critical functionality)

Timeout and Retry Behavior

Consultation timeouts:

Default: 90 seconds
Configurable via .claude/ai_config.yaml (run-tests.consultation.timeout_seconds)

When tools time out:

Simplify prompt (remove large code blocks)
Try different tool from routing matrix
Check if tool process is hung: ps aux | grep <tool>
Increase timeout in config if needed

Tool Availability Fallbacks

Recommended	If Unavailable	How to Compensate
Gemini	Codex or Cursor	Ask "why" with extra context; use web search
Codex	Gemini	Ask for very specific code examples
Cursor	Manual Grep + Gemini	Use Grep to find patterns, Gemini to analyze

Advanced Topics

Multi-Agent Analysis

Multi-agent mode consults two agents in parallel and synthesizes their insights:

sdd test consult fixture --error "..." --hypothesis "..." --multi-agent

Output includes:

Consensus points (where agents agree)
Unique insights from each agent
Synthesis combining both analyses
High-confidence recommendations

Benefits:

Higher confidence through multiple perspectives
Better coverage (each agent contributes unique insights)
Risk reduction (divergent views expose alternatives)

Using pytest-pdb for Debugging

# Drop into debugger on failure
pytest --pdb

# Drop into debugger on first failure
pytest -x --pdb

Custom Markers for Test Organization

# conftest.py
def pytest_configure(config):
    config.addinivalue_line("markers", "slow: marks tests as slow")
    config.addinivalue_line("markers", "integration: marks integration tests")
    config.addinivalue_line("markers", "unit: marks unit tests")

# Usage
@pytest.mark.slow
def test_complex_calculation():
    pass

Mocking External Services

from unittest.mock import Mock, patch

def test_api_call():
    with patch('requests.get') as mock_get:
        mock_get.return_value.json.return_value = {"status": "ok"}
        result = fetch_data()
        assert result["status"] == "ok"
        mock_get.assert_called_once()

Troubleshooting

"Fixture not found"

Check fixture is defined in conftest.py or same file
Verify fixture name matches exactly
Check fixture scope is appropriate
Ensure conftest.py is in correct directory

"Import error"

Check PYTHONPATH includes src directory
Verify __init__.py files exist
Check for circular imports
Verify package installed in development mode

"Tests pass locally but fail in CI"

Check for hardcoded paths
Verify all dependencies in requirements
Check for timezone issues
Look for race conditions
Check file system differences

"Test is too slow"

Use fixtures with appropriate scope
Mock external services
Use in-memory databases
Parallelize: sdd test run --parallel

Best Practices

Running Tests

Start with verbose mode (-v) for better visibility
Use -x to stop on first failure when debugging
Run specific tests to iterate faster
Use markers to organize test runs

Debugging Strategy

Read error messages carefully
Check last line of stack trace first
Use -l flag to see local variables
Add temporary print statements for quick debugging

Consultation Workflow

For test failures:

Do initial investigation first
Check tool availability: sdd test check-tools
Consult available tools (mandatory if tests failed)
Share your hypothesis - don't ask blind questions
Synthesize insights from tools + your analysis
YOU implement using Edit/Write tools
Test thoroughly

Skip consultation when:

Tests all passed
Verification/smoke tests succeeded
Post-fix confirmation (tests already passed once)
No tools available

Success Criteria

A test debugging session is successful when:

✓ All tests pass
✓ No new tests are broken
✓ Root cause is understood
✓ Fix is documented
✓ Code is cleaner/clearer than before (when appropriate)

run-tests

Pytest Testing and Debugging Skill

Overview

⚠️ Long-Running Operations

CRITICAL: Avoid BashOutput Spam

Why?

Example (CORRECT):

Example (WRONG):

Core Workflow

Phase 1: Run Tests

Discover Test Structure (Optional)

Run Tests

Capture Output

Phase 2: Investigate Failures

Categorize the Failure

Extract Key Information

Examine the Code

Phase 3: Gather Code Context (Optional)

Phase 4: Consult External Tools

Check Tool Availability

Consult Tools

Tool Selection Guide

When to Use Multiple Tools

Effective Prompting

Phase 5: Fix & Verify

Synthesize Findings

Implement Fix

Verify

Document

CLI Reference

sdd test check-tools

sdd test run

sdd test consult

sdd test discover

Global Options

Common Patterns

Multiple Failing Tests

Flaky Tests

Fixture Issues

Integration Test Failures

Tool Routing Matrix

Special Scenarios

Verification Runs (Confirming Refactors)

When Tools Disagree

When to Escalate to Additional Tools

Timeout and Retry Behavior

Tool Availability Fallbacks

Advanced Topics

Multi-Agent Analysis

Using pytest-pdb for Debugging

Custom Markers for Test Organization

Mocking External Services

Troubleshooting

"Fixture not found"

"Import error"

"Tests pass locally but fail in CI"

"Test is too slow"

Best Practices

Running Tests

Debugging Strategy

Consultation Workflow

Success Criteria

Pytest Testing and Debugging Skill

Overview

⚠️ Long-Running Operations

CRITICAL: Avoid BashOutput Spam

Why?

Example (CORRECT):

Example (WRONG):

Core Workflow

Phase 1: Run Tests

Discover Test Structure (Optional)

Run Tests

Capture Output

Phase 2: Investigate Failures

Categorize the Failure

Extract Key Information

Examine the Code

Phase 3: Gather Code Context (Optional)

Phase 4: Consult External Tools