Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

backtest

This skill should be used when the user asks to "backtest review skills", "test detection rate", "バックテスト", "レビュースキルをテスト", "検出率を測定", "過去のバグで検証", or wants to verify that generated review skills can detect known bugs by replaying historical states.

Ejecutar en Manus

Resumen

Comando de instalación

npx skills add https://github.com/suzuki0keiichi/claude-plugins-suzuki0keiichi --skill backtest

Copia y pega este comando en Claude Code para instalar la habilidad

Fuente

suzuki0keiichi/claude-plugins-suzuki0keiichi

Estrellas0

Forks0

Actualizado22 de marzo de 2026, 16:18

SKILL.md

readonly

name	backtest
description	This skill should be used when the user asks to "backtest review skills", "test detection rate", "バックテスト", "レビュースキルをテスト", "検出率を測定", "過去のバグで検証", or wants to verify that generated review skills can detect known bugs by replaying historical states.
argument-hint	["--add-case to add a test case instead of running"]

Backtest: Review Skill Detection Testing

Test generated review skills against historical bugs by replaying the codebase state at the time each bug was introduced. Measures both recall (did we catch known bugs?) and precision (were our findings validated by subsequent fixes?).

Prerequisites:

Generated skills exist in .claude/skills/
workspace/ contains the project clone

Test Case Management

Adding Test Cases

When invoked with --add-case or when the user wants to add a test case:

Prompt for:

Source: Where was this bug found? (PR #, JIRA ticket, Sentry event, postmortem, Slack thread)
Commit: The commit that introduced the bug (or the fix commit to reverse-engineer from)
Description: What was the bug?
Expected detection: Which perspective(s) should catch this?

Append to backtest/test-cases.md:

### Case [N]: [brief title]
- **Source**: [PR/JIRA/Sentry/postmortem reference]
- **Bug commit**: [hash]
- **Fix commit**: [hash] (if available)
- **Description**: [what the bug was]
- **Expected perspective**: [which perspective should detect it]
- **Added**: [date]

Bulk Import

When importing from interview data:

Scan bug-patterns.md for commits with both bug-introducing and fix commits identified
Auto-generate test cases for each

Running Backtest

Execution Flow

For each test case in backtest/test-cases.md:

Checkout: cd workspace && git checkout {bug_commit} (the state WITH the bug)
Generate diff: git diff {bug_commit}~1 {bug_commit} (the buggy change)
Execute review: Run the review entry point (review-* skill in .claude/skills/) against this diff, with the --backtest context flag. The consolidation step will save the review to reviews/ as usual — backtest does NOT change the review output location.
Evaluate detection (recall): Did any finding match the known bug?
- Match criteria: same file, related description, severity >= Important
- Partial match: right area but wrong specific issue
- Miss: no finding related to the known bug
Evaluate precision (forward validation): For findings that DON'T match the known bug:
- Check if the cited code was modified in subsequent commits: git log {bug_commit}..{default_branch} -- {file_path}
- If modified: read the fix commit diff to determine if the finding's concern was addressed
- Validated: finding pointed to real code that was later changed to address the same concern
- Unvalidated: finding pointed to code that was never subsequently changed (may be false positive, or unfixed issue)
Restore: cd workspace && git checkout {default_branch}

IMPORTANT: In step 1, we checkout {bug_commit} (NOT {bug_commit}~1). The workspace must contain the buggy code so that the orchestrator's Phase 1.5 fact-check (workspace verification) can confirm the bug exists. If the workspace were at {bug_commit}~1, the buggy code wouldn't exist in workspace and all findings would be falsely dropped.

Results

The review itself is saved to reviews/ by the orchestrator (same as any normal review). The backtest evaluation (recall/precision analysis) is written separately to backtest/results/YYYY-MM-DD-{target}.md:

# Backtest Results: [date]

## Summary
- Test cases: N
- Detected (recall): N/N (X%)
- Partial: N (X%)
- Missed: N (X%)
- Additional findings: N
  - Validated by subsequent fixes: N (X%)
  - Unvalidated: N

## Per-Case Results

### Case [N]: [title]
- **Known bug result**: detected / partial / missed
- **Detecting perspective**: [which perspective found it, if any]
- **Finding**: [the relevant finding, if any]
- **Notes**: [why it was missed, if applicable]
- **Additional findings**: N
  - Validated: [list findings that were later fixed, with fix commit hash]
  - Unvalidated: [list findings with no subsequent fix]

## Analysis

### Recall (known bug detection)
[Perspectives or bug types with low detection rate]

### Precision (forward validation)
[Rate of findings validated by subsequent fixes]
[High validation rate = review is finding real issues]
[Low validation rate = review may be producing noise]

### Recommendations
[Specific suggestions for skill improvement based on misses and validation rates]

Learning Extraction (backtest後に自動実行)

backtestの結果からMISS/Partialを分析し、backtest/learnings.md に構造化して追記する。このファイルは build-skills と update-skills が読み込み、スキル生成に反映する。

抽出プロセス

各 MISS または Partial match について：

根本原因分析: なぜ検出できなかったか？
- どのパースペクティブが担当すべきだったか
- 既存のチェック項目の何が不足していたか
- どういうチェックがあれば検出できたか
パターン抽出: 再利用可能な検出ルールに変換
- 具体的なバグ→汎用的なチェックパターンに抽象化
- 例: 「locked issueガード欠如」→「同一データセットを処理する並列関数間で防御的チェックが非対称」
追記: backtest/learnings.md に以下の形式で追記

### Learning [N]: [パターン名]
- **Source**: backtest [date], Case [N] (MISS/Partial)
- **Bug**: [何が起きたか]
- **Root cause**: [なぜ検出できなかったか]
- **Check to add**: [具体的に何をチェックすべきか]
- **Target perspective**: [どのパースペクティブに追加すべきか]
- **Pattern type**: code-symmetry / state-transition / boundary-check / ...
- **Added**: [date]

既存の learning と重複する場合は追記しない。

Interpretation

Recall (known bug detection)

Detection rate > 70%: good for deployment
Detection rate 40-70%: usable but needs skill refinement
Detection rate < 40%: skills need significant rework, feed results to update-skills

Precision (forward validation)

Validation rate > 50%: excellent — review is finding real issues beyond the known bug
Validation rate 20-50%: good — some noise but meaningful signal
Validation rate < 20%: review may be producing too much noise

A review system with high recall AND high precision is genuinely useful — it catches known bugs and also surfaces issues that developers independently recognized and fixed.

Compare with previous backtest results to track improvement over time.

Más de este repositorio

mismo repositorio

build-skills

suzuki0keiichi/claude-plugins-suzuki0keiichi

This skill should be used when the user asks to "build review skills", "generate review skills", "create review perspectives", "build review setup", "レビュースキルを生成", "レビュースキルをビルド", "レビューを作って", or after interview skill has populated the knowledge-base. This is the core of tailored-reviewer: it builds project-specific SKILL.md files that form a complete review system.

2026-03-240

interview

suzuki0keiichi/claude-plugins-suzuki0keiichi

This skill should be used when the user asks to "set up tailored review", "interview for review", "collect project information", "initialize review project", "プロジェクト情報を収集", "レビュー用の情報を集めて", "インタビューして", or needs to collect project-specific knowledge for review skill generation. Also triggered when existing knowledge-base files have stale last_verified dates, or when tailored-reviewer version has changed since last interview.

2026-03-240

update-skills

suzuki0keiichi/claude-plugins-suzuki0keiichi

This skill should be used when the user asks to "update skills", "update review skills", "refresh skills", "apply feedback", "スキルを更新", "レビュースキルを更新", "フィードバックを反映", or when triggered by knowledge-base changes, new feedback entries, or plugin version updates.

2026-03-220

debug-review

suzuki0keiichi/claude-plugins-suzuki0keiichi

This skill should be used when the user asks to "debug review skills", "validate generated skills", "check skill quality", "スキルをデバッグ", "生成スキルを検証", or automatically after build-skills completes. Validates that generated review skills are structurally correct, comprehensive, project-specific, and behaviorally sound.

2026-03-220

submit-feedback

suzuki0keiichi/claude-plugins-suzuki0keiichi

This skill should be used when the user asks to "submit feedback", "report improvement", "create plugin issue", "フィードバックを送信", "プラグインの改善提案", or when veteran edits contain patterns that would benefit all tailored-reviewer users, not just this project.

2026-03-220

health-score

suzuki0keiichi/claude-plugins-suzuki0keiichi

This skill should be used when the user asks to "check project health", "run health score", "health check", "プロジェクトの健康状態", "ヘルススコア", "健全性チェック", or on a schedule via cowork. Operates independently from review skills.

2026-03-200

Fuente

suzuki0keiichi

suzuki0keiichi/claude-plugins-suzuki0keiichi

Abrir repositorio de GitHub Ver repositorios del creador

Comando de instalación

Descarga

Ejecutar en Manus

Útil paraSOC

Analistas de garantía de calidad de software y probadoresOcupaciones informáticas y matemáticas15-1253L4

name	backtest
description	This skill should be used when the user asks to "backtest review skills", "test detection rate", "バックテスト", "レビュースキルをテスト", "検出率を測定", "過去のバグで検証", or wants to verify that generated review skills can detect known bugs by replaying historical states.
argument-hint	["--add-case to add a test case instead of running"]

Backtest: Review Skill Detection Testing

Prerequisites:

Generated skills exist in .claude/skills/
workspace/ contains the project clone

Test Case Management

Adding Test Cases

When invoked with --add-case or when the user wants to add a test case:

Prompt for:

Source: Where was this bug found? (PR #, JIRA ticket, Sentry event, postmortem, Slack thread)
Commit: The commit that introduced the bug (or the fix commit to reverse-engineer from)
Description: What was the bug?
Expected detection: Which perspective(s) should catch this?

Append to backtest/test-cases.md:

### Case [N]: [brief title]
- **Source**: [PR/JIRA/Sentry/postmortem reference]
- **Bug commit**: [hash]
- **Fix commit**: [hash] (if available)
- **Description**: [what the bug was]
- **Expected perspective**: [which perspective should detect it]
- **Added**: [date]

Bulk Import

When importing from interview data:

Scan bug-patterns.md for commits with both bug-introducing and fix commits identified
Auto-generate test cases for each

Running Backtest

Execution Flow

For each test case in backtest/test-cases.md:

Checkout: cd workspace && git checkout {bug_commit} (the state WITH the bug)
Generate diff: git diff {bug_commit}~1 {bug_commit} (the buggy change)
Execute review: Run the review entry point (review-* skill in .claude/skills/) against this diff, with the --backtest context flag. The consolidation step will save the review to reviews/ as usual — backtest does NOT change the review output location.
Evaluate detection (recall): Did any finding match the known bug?
- Match criteria: same file, related description, severity >= Important
- Partial match: right area but wrong specific issue
- Miss: no finding related to the known bug
Evaluate precision (forward validation): For findings that DON'T match the known bug:
- Check if the cited code was modified in subsequent commits: git log {bug_commit}..{default_branch} -- {file_path}
- If modified: read the fix commit diff to determine if the finding's concern was addressed
- Validated: finding pointed to real code that was later changed to address the same concern
- Unvalidated: finding pointed to code that was never subsequently changed (may be false positive, or unfixed issue)
Restore: cd workspace && git checkout {default_branch}

Results

# Backtest Results: [date]

## Summary
- Test cases: N
- Detected (recall): N/N (X%)
- Partial: N (X%)
- Missed: N (X%)
- Additional findings: N
  - Validated by subsequent fixes: N (X%)
  - Unvalidated: N

## Per-Case Results

### Case [N]: [title]
- **Known bug result**: detected / partial / missed
- **Detecting perspective**: [which perspective found it, if any]
- **Finding**: [the relevant finding, if any]
- **Notes**: [why it was missed, if applicable]
- **Additional findings**: N
  - Validated: [list findings that were later fixed, with fix commit hash]
  - Unvalidated: [list findings with no subsequent fix]

## Analysis

### Recall (known bug detection)
[Perspectives or bug types with low detection rate]

### Precision (forward validation)
[Rate of findings validated by subsequent fixes]
[High validation rate = review is finding real issues]
[Low validation rate = review may be producing noise]

### Recommendations
[Specific suggestions for skill improvement based on misses and validation rates]

Learning Extraction (backtest後に自動実行)

抽出プロセス

各 MISS または Partial match について：

根本原因分析: なぜ検出できなかったか？
- どのパースペクティブが担当すべきだったか
- 既存のチェック項目の何が不足していたか
- どういうチェックがあれば検出できたか
パターン抽出: 再利用可能な検出ルールに変換
- 具体的なバグ→汎用的なチェックパターンに抽象化
- 例: 「locked issueガード欠如」→「同一データセットを処理する並列関数間で防御的チェックが非対称」
追記: backtest/learnings.md に以下の形式で追記

### Learning [N]: [パターン名]
- **Source**: backtest [date], Case [N] (MISS/Partial)
- **Bug**: [何が起きたか]
- **Root cause**: [なぜ検出できなかったか]
- **Check to add**: [具体的に何をチェックすべきか]
- **Target perspective**: [どのパースペクティブに追加すべきか]
- **Pattern type**: code-symmetry / state-transition / boundary-check / ...
- **Added**: [date]

既存の learning と重複する場合は追記しない。

Interpretation

Recall (known bug detection)

Detection rate > 70%: good for deployment
Detection rate 40-70%: usable but needs skill refinement
Detection rate < 40%: skills need significant rework, feed results to update-skills

Precision (forward validation)

Validation rate > 50%: excellent — review is finding real issues beyond the known bug
Validation rate 20-50%: good — some noise but meaningful signal
Validation rate < 20%: review may be producing too much noise

A review system with high recall AND high precision is genuinely useful — it catches known bugs and also surfaces issues that developers independently recognized and fixed.

Compare with previous backtest results to track improvement over time.