| name | run-tests |
| description | Comprehensive pytest testing and debugging framework. Use when running tests, debugging failures, fixing broken tests, or investigating test errors. Includes systematic investigation workflow with external AI tool consultation and verification strategies. |
Pytest Testing and Debugging Skill
Overview
This skill provides a systematic approach to running tests and debugging failures using pytest. The core workflow integrates investigation, external tool consultation, and verification to efficiently resolve test failures.
Key capabilities:
- Run tests with presets for common scenarios (debug, quick, coverage)
- Systematic investigation and hypothesis formation
- External AI tool consultation (gemini, codex, cursor-agent) when tests fail
- Multi-agent analysis for complex issues
- Test discovery and structure analysis
⚠️ Long-Running Operations
This skill may run operations that take up to 5 minutes. Be patient and wait for completion.
CRITICAL: Avoid BashOutput Spam
- ALWAYS use foreground execution with 5-minute timeout:
Bash(command="...", timeout=300000)
- WAIT for the command to complete - this may take the full 5 minutes
- NEVER use
run_in_background=True for test suites, builds, or analysis
- If you must use background (rare), wait at least 60 seconds between BashOutput checks
- Maximum 3 BashOutput calls per background process - then kill it or let it finish
Why?
Polling BashOutput repeatedly creates spam and degrades user experience. Long operations should run in foreground with appropriate timeout, not in background with frequent polling.
Example (CORRECT):
# Test suite that might take 5 minutes (timeout in milliseconds)
result = Bash(command="pytest src/", timeout=300000) # Wait up to 5 minutes
# The command will block here until completion - this is correct behavior
Example (WRONG):
# Don't use background + polling
bash_id = Bash(command="pytest", run_in_background=True)
output = BashOutput(bash_id) # Creates spam!
Core Workflow
5-Phase Process:
- Run Tests - Execute tests with appropriate flags
- Investigate - Analyze failures, form hypothesis
- Gather Context - Optionally use code documentation for faster understanding
- Consult - Get external tool insights (mandatory for failures if tools available)
- Fix & Verify - Implement changes and confirm no regressions
Key principles:
- Investigation-first - Always analyze before consulting
- Hypothesis-driven - Form theories, then validate
- Mandatory consultation for failures - If tests fail and tools exist, consult them
- Skip when passing - Tests pass? Done. No consultation needed.
Quick decision guide:
- ✅ Tests pass? → Done
- ❌ Simple fix (typo/obvious)? → Fix → Verify
- ❌ Complex/unclear? → Investigate → Consult → Fix → Verify
Phase 1: Run Tests
Discover Test Structure (Optional)
If unfamiliar with test organization:
sdd test discover --summary
sdd test discover --tree
Run Tests
sdd test run --quick
sdd test run --preset-debug
sdd test run tests/test_module.py::test_function
sdd test run --coverage
sdd test run --list
Or use pytest directly:
pytest -v
pytest -vv -l -s
pytest -x
pytest -k "test_user"
Capture Output
For large test suites with many failures:
sdd test run --preset-debug | tee /tmp/test-run-$(date +%Y%m%d-%H%M%S).log
Phase 2: Investigate Failures
Categorize the Failure
- Assertion - Expected vs actual mismatch
- Exception - Runtime errors (AttributeError, KeyError, etc.)
- Import - Missing dependencies or module issues
- Fixture - Fixture or configuration issues
- Timeout - Performance or hanging issues
- Flaky - Non-deterministic failures
Extract Key Information
For each failure:
- Test file and function name
- Line number where failure occurred
- Error type and message
- Full stack trace
- Relevant code context
Examine the Code
- Read the failing test
- Read the implementation being tested
- Understand what the test verifies
- Identify expected vs actual behavior
- Form your hypothesis - What's causing the failure?
Phase 3: Gather Code Context (Optional)
When available: If codebase documentation exists (generated by sdd doc generate), use it for faster investigation.
Check availability:
sdd doc stats
Useful commands when debugging:
sdd doc search "authentication"
sdd doc show-function AuthService.login
sdd doc list-dependencies src/services/authService.ts
sdd doc dependencies --reverse src/auth.py
Benefits:
- Faster context gathering
- Better root cause analysis
- Discover similar patterns
- Impact analysis
If not available: Continue with standard file exploration. Run sdd doc generate to create documentation for future use.
Phase 4: Consult External Tools
CRITICAL: This is mandatory for test failures when external tools exist.
Check Tool Availability
sdd test check-tools
Decision:
- Any tool available AND tests failed → Consult (mandatory)
- No tools available → Skip to Phase 5
- Tests passed → Skip to Phase 5 (no consultation needed)
Consult Tools
All external tools operate in read-only mode. They analyze and suggest; YOU implement all fixes.
sdd test consult assertion --error "Full error message" --hypothesis "Your theory about the cause"
sdd test consult exception --error "AttributeError: ..." --hypothesis "Missing return" --test-code tests/test_file.py --impl-code src/module.py
sdd test consult --list-routing
sdd test consult --tool gemini --prompt "Custom question..."
Tool Selection Guide
| Tool | Best For | Example Use |
|---|
| Gemini | Hypothesis validation, framework explanations, strategic guidance | "Why is this fixture not found?" |
| Codex | Code-level review, specific fix suggestions | "Review this code and suggest fixes" |
| Cursor | Repo-wide discovery, finding patterns | "Find all call sites" |
When to Use Multiple Tools
Use multi-agent consultation for:
- High-stakes fixes affecting critical functionality
- Complex issues with unclear root cause
- Need validation from multiple perspectives
- Uncertain between multiple approaches
sdd test consult assertion --error "..." --hypothesis "..." --multi-agent
sdd test consult exception --error "..." --hypothesis "..." --multi-agent --agents gemini,codex
Effective Prompting
- Share your hypothesis - Ask "is my theory correct?" not "what's wrong?"
- Provide complete context - Error messages, code, stack traces
- Include what you've tried - Show your investigation work
- Ask for explanations - Understand "why", not just "how to fix"
- Be specific - State exactly what you need
Phase 5: Fix & Verify
Synthesize Findings
Combine insights from:
- Your investigation and hypothesis
- External tool recommendations
- Any additional research
Implement Fix
Verify
sdd test run tests/test_module.py::test_function
sdd test run
pytest tests/ -v
Document
Add comments explaining:
- What was wrong
- Why the fix works
- Any assumptions or limitations
CLI Reference
sdd test check-tools
Check availability of external tools and get routing suggestions.
sdd test check-tools
sdd test check-tools --route assertion
sdd test check-tools --route fixture
sdd test run
Smart pytest runner with presets for common scenarios.
sdd test run --list
sdd test run --quick
sdd test run --preset-debug
sdd test run --coverage
sdd test run --fast
sdd test run --parallel
sdd test run tests/test_file.py::test_name
sdd test consult
External tool consultation with auto-routing.
sdd test consult {assertion|exception|fixture|import|timeout|flaky} --error "..." --hypothesis "..."
sdd test consult exception --error "..." --hypothesis "..." --test-code tests/test.py --impl-code src/module.py
sdd test consult assertion --error "..." --hypothesis "..." --multi-agent
sdd test consult --tool {gemini|codex|cursor} --prompt "..."
sdd test consult --list-routing
sdd test consult fixture --error "..." --hypothesis "..." --dry-run
sdd test discover
Test structure analyzer and discovery.
sdd test discover --summary
sdd test discover --tree
sdd test discover --fixtures
sdd test discover --markers
sdd test discover --detailed
sdd test discover tests/unit --summary
Global Options
Available on all commands:
--no-color - Disable colored output
--verbose, -v - Show detailed output
--quiet, -q - Minimal output (errors only)
Common Patterns
Multiple Failing Tests
- Group by error type
- Fix one group at a time
- Look for common root causes
- Consider whether tests need updating vs code needs fixing
Flaky Tests
pytest tests/test_flaky.py --count=10
pytest --random-order
Fixture Issues
pytest --setup-show tests/test_module.py
pytest --fixtures
Common fixture problems:
- Fixture not in conftest.py or test file
- Fixture name doesn't match exactly
- conftest.py in wrong directory
- Incorrect fixture scope
Integration Test Failures
Check in order:
- External dependencies
- Test environment setup
- Database state
- Configuration
- Network connectivity
Tool Routing Matrix
Quick reference for which tool to use based on failure type:
| Failure Type | Primary Tool | Secondary (if needed) | Why |
|---|
| Assertion mismatch | Codex | Gemini | Code-level bug analysis |
| Exceptions | Codex | Gemini | Precise code review |
| Import/packaging | Gemini | Cursor | Framework expertise |
| Fixture issues | Gemini | Cursor | Pytest scoping knowledge |
| Timeout/performance | Gemini + Cursor | - | Strategy + pattern discovery |
| Flaky tests | Gemini + Cursor | - | Diagnosis + state dependencies |
| Multi-file issues | Cursor | Gemini | Discovery + synthesis |
| Unclear errors | Gemini | Web search | Explanation first |
Query type routing:
- "Why is this happening?" → Gemini
- "Is this code wrong?" → Codex
- "Where else does this occur?" → Cursor
- "What should I do?" → Gemini + Codex
Special Scenarios
Verification Runs (Confirming Refactors)
When running tests to verify refactoring:
sdd test run
Key point: Passing verification runs require no consultation. Only investigate failures.
When Tools Disagree
If two tools give different recommendations:
- Compare reasoning - Which explanation is more thorough?
- Check scope - Which considers broader impact?
- Apply critical thinking - Which aligns with your investigation?
- Try simplest first - Implement less invasive fix first
- Document uncertainty - Note in code comments
When to Escalate to Additional Tools
Use additional tools when:
- Answer is unclear or vague
- Answer contradicts your analysis
- Answer raises new questions
- Partial answer (addresses some aspects only)
- High-stakes scenario (critical functionality)
Timeout and Retry Behavior
Consultation timeouts:
- Default: 90 seconds
- Configurable via
.claude/ai_config.yaml (run-tests.consultation.timeout_seconds)
When tools time out:
- Simplify prompt (remove large code blocks)
- Try different tool from routing matrix
- Check if tool process is hung:
ps aux | grep <tool>
- Increase timeout in config if needed
Tool Availability Fallbacks
| Recommended | If Unavailable | How to Compensate |
|---|
| Gemini | Codex or Cursor | Ask "why" with extra context; use web search |
| Codex | Gemini | Ask for very specific code examples |
| Cursor | Manual Grep + Gemini | Use Grep to find patterns, Gemini to analyze |
Advanced Topics
Multi-Agent Analysis
Multi-agent mode consults two agents in parallel and synthesizes their insights:
sdd test consult fixture --error "..." --hypothesis "..." --multi-agent
Output includes:
- Consensus points (where agents agree)
- Unique insights from each agent
- Synthesis combining both analyses
- High-confidence recommendations
Benefits:
- Higher confidence through multiple perspectives
- Better coverage (each agent contributes unique insights)
- Risk reduction (divergent views expose alternatives)
Using pytest-pdb for Debugging
pytest --pdb
pytest -x --pdb
Custom Markers for Test Organization
def pytest_configure(config):
config.addinivalue_line("markers", "slow: marks tests as slow")
config.addinivalue_line("markers", "integration: marks integration tests")
config.addinivalue_line("markers", "unit: marks unit tests")
@pytest.mark.slow
def test_complex_calculation():
pass
Mocking External Services
from unittest.mock import Mock, patch
def test_api_call():
with patch('requests.get') as mock_get:
mock_get.return_value.json.return_value = {"status": "ok"}
result = fetch_data()
assert result["status"] == "ok"
mock_get.assert_called_once()
Troubleshooting
"Fixture not found"
- Check fixture is defined in conftest.py or same file
- Verify fixture name matches exactly
- Check fixture scope is appropriate
- Ensure conftest.py is in correct directory
"Import error"
- Check PYTHONPATH includes src directory
- Verify
__init__.py files exist
- Check for circular imports
- Verify package installed in development mode
"Tests pass locally but fail in CI"
- Check for hardcoded paths
- Verify all dependencies in requirements
- Check for timezone issues
- Look for race conditions
- Check file system differences
"Test is too slow"
- Use fixtures with appropriate scope
- Mock external services
- Use in-memory databases
- Parallelize:
sdd test run --parallel
Best Practices
Running Tests
- Start with verbose mode (
-v) for better visibility
- Use
-x to stop on first failure when debugging
- Run specific tests to iterate faster
- Use markers to organize test runs
Debugging Strategy
- Read error messages carefully
- Check last line of stack trace first
- Use
-l flag to see local variables
- Add temporary print statements for quick debugging
Consultation Workflow
For test failures:
- Do initial investigation first
- Check tool availability:
sdd test check-tools
- Consult available tools (mandatory if tests failed)
- Share your hypothesis - don't ask blind questions
- Synthesize insights from tools + your analysis
- YOU implement using Edit/Write tools
- Test thoroughly
Skip consultation when:
- Tests all passed
- Verification/smoke tests succeeded
- Post-fix confirmation (tests already passed once)
- No tools available
Success Criteria
A test debugging session is successful when:
- ✓ All tests pass
- ✓ No new tests are broken
- ✓ Root cause is understood
- ✓ Fix is documented
- ✓ Code is cleaner/clearer than before (when appropriate)