一键在 Manus 中运行任何 Skill

$pwd:

audit-temporal-walk

Name: Audit Temporal Walk
Author: ben-manes

// Heavyweight history-mining bug audit. Walks the caffeine module's git history chronologically (oldest to HEAD), maintains a forward-tracked issue database, and surfaces concerns introduced by past commits that were never resolved. Catches bugs that snapshot mining cannot — half-fixes invisible from current state, latent+trigger pairs across multi-commit interactions, and partial refactors. Slow (~8-14 hours) and rare-run (every several months or before a major release).

在 Manus 中运行

$ git log --oneline --stat

stars:17,676

forks:1,688

updated:2026年5月9日 22:27

文件资源管理器

6 个文件

SKILL.md

readonly

name	audit-temporal-walk
description	Heavyweight history-mining bug audit. Walks the caffeine module's git history chronologically (oldest to HEAD), maintains a forward-tracked issue database, and surfaces concerns introduced by past commits that were never resolved. Catches bugs that snapshot mining cannot — half-fixes invisible from current state, latent+trigger pairs across multi-commit interactions, and partial refactors. Slow (~8-14 hours) and rare-run (every several months or before a major release).
disable-model-invocation	true

Audit: Temporal Walk

This is a long-running CLI tool, not an interactive workflow. It walks every commit affecting the caffeine module from project inception to HEAD, asking Claude per commit to flag/resolve/modify a forward-tracked issue database. Issues that survive to HEAD are verified against current code and emitted as detail.dev-format findings.

When to run

Before a major release, as a final-pass audit
After a long sequence of refactors, to catch half-fixes
Once per several months as a baseline audit
Not for routine pre-commit review (use /review-change for that)

How to run

The walker uses the claude CLI's default model (the session's current model) unless --model is passed. For a heavyweight rare-run audit, prefer running it in a session on the strongest model available.

# Walk (long-running; safe to interrupt — resumable):
python3 .claude/skills/audit-temporal-walk/walker.py

# In tmux/nohup for multi-hour reliability:
nohup python3 .claude/skills/audit-temporal-walk/walker.py \
  > .claude/reports/audit-temporal-walk-<module>/walk.log 2>&1 &

# Process N commits then stop cleanly (useful for chunked runs):
python3 .claude/skills/audit-temporal-walk/walker.py --max-commits 200

# Disable inner-model tool access (faster, less accurate — see Design notes):
python3 .claude/skills/audit-temporal-walk/walker.py --no-tools

# Inspect state without running:
python3 .claude/skills/audit-temporal-walk/walker.py --summary

# After the walk completes, verify surviving issues against HEAD:
python3 .claude/skills/audit-temporal-walk/verify.py --min-confidence med

# Read the verified findings:
cat .claude/reports/audit-temporal-walk-<module>/findings.md

Wall clock is roughly 8-14 hours for the full caffeine module (~750 commits) and is dominated by model latency. Tool-enabled mode (default) adds modest overhead from per-commit Read/Grep round-trips. Resumable from checkpoint after quota exhaustion or interruption.

What the walker does

For each substantive commit (skipping doc/style/dep-bump only), the walker:

Checks out the commit into a managed detached worktree under .claude/reports/audit-temporal-walk-<module>/worktree/
Invokes claude -p with cwd=worktree and --tools "Read,Glob,Grep", so the inner model can verify hypotheses against the codebase at that commit's state, not HEAD
Shows the commit's diff (scoped to the configured module) and the currently-open tracked issues whose files this commit touches

Claude returns deltas: which open issues this commit resolves, which it modifies (e.g., a contract change makes the issue more dangerous), and any new concerns the commit introduces. Each new concern requires a concrete bug witness — the input or scenario that exposes the failure, expressed strongly enough that a developer could write a failing unit test directly from it.

The pattern catalog and design-priors in per-commit.txt are tuned to caffeine's bug history (operator-order in halving formulas, sibling divergence between sync/async paths, missing lifecycle guards, etc.) and caffeine's documented intentional patterns (lossy buffers, best-effort refresh, async-listener semantics).

What the verifier does

After the walk, verify.py reads each surviving open issue, grounds it against current HEAD code (file-grep ranks files by symbol-match-count to find code that has moved/renamed since introduction), and asks Claude whether the bug witness still applies. Verdicts: still_exists, implicitly_resolved, false_positive. The verifier prompt includes .claude/docs/design-decisions.md and cross_model_audit_results.md as filter sources.

Output is a detail.dev-format markdown report with full commit lineage already attached to each finding.

Output

The reports directory is auto-derived from the first segment of WALKER_SCOPE (the module name): caffeine runs write to .claude/reports/audit-temporal-walk-caffeine/, jcache runs to audit-temporal-walk-jcache/, etc. All outputs are gitignored via .claude/reports/:

state.json — walker's issue database
verified.json — per-issue verdicts
findings.md — detail.dev-format report
worktree/ — managed detached worktree used for per-commit snapshots (deleting it is safe; the next walk re-creates it)
log/<sha>.raw.json — per-commit raw responses
verify-log/<id>.raw.json — per-issue verifier responses

After running, the walker's findings should still be reviewed by hand — expect ~30-40% true-positive rate among surviving findings, with the rest being subtle design-intent matches that the priors don't quite cover.

What to do with a finding

For each still_exists finding in findings.md:

Read the lineage to understand why the bug exists
Cross-check against .claude/docs/design-decisions.md and ~/.claude/projects/-Users-ben-projects-caffeine/memory/cross_model_audit_results.md
Write a failing test that exposes the bug witness
If the test confirms, fix and commit. If the test passes (false positive), record the case in cross_model_audit_results.md so future audits don't re-raise it.

Design notes

Forward-tracked, not snapshot-mined. Catches half-fixes and latent+trigger pairs invisible from current state. See README.md for the design rationale and how this differs from /audit-* snapshot-style audits.
Resumable. State is persisted after every commit. Quota exhaustion or interruption leaves the next-commit pointer at the last successful commit; re-running picks up from there.
Tools scoped to the commit snapshot. The inner claude -p runs with cwd set to a detached worktree checked out at the commit being analyzed, and tools restricted to Read,Glob,Grep. This lets the model verify hypotheses against surrounding code (callers, sibling implementations, full method bodies outside the diff hunk) without seeing HEAD code from future commits — which would collapse the forward-tracking premise (every "issue" would look already fixed by some later commit). --no-tools falls back to diff-only analysis. --disable-slash-commands is always on.
Self-grounding. The verifier prompt requires that quoted code be copied verbatim from the shown HEAD code; verdicts that reference symbols not present in HEAD must return implicitly_resolved. This was load-bearing in early validation: the first verifier run hallucinated a finding citing a nonexistent file, fixed by hardening the grounding rules.

related-skills.json

同仓库

audit-sibling-divergence.md

from "ben-manes/caffeine"

Differential audit comparing matched code paths that should behave identically. Spawns one auditor per sibling pair (sync/async, bounded/unbounded, view consistency, bulk vs single, generated node variants, read fast vs slow) and requires a concrete witness scenario where the two paths diverge observably.

2026-05-1717.7k

audit-contract-drift.md

from "ben-manes/caffeine"

Find places where documented API contracts and the implementation diverge

2026-04-2717.7k

audit-exception-safety.md

from "ben-manes/caffeine"

Audit exception safety and failure atomicity across all throw sites

2026-04-1317.7k

audit-feature-interaction.md

from "ben-manes/caffeine"

Analyze feature interaction pairs and triples for concurrent defects

2026-04-1317.7k

audit-iteration.md

from "ben-manes/caffeine"

Analyze concurrent iteration and view consistency guarantees

2026-04-1317.7k

audit-jmm.md

from "ben-manes/caffeine"

Java Memory Model audit of all VarHandle/volatile field access modes

2026-04-1317.7k

package.json

"author": "ben-manes"

"repository": "ben-manes/caffeine"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

软件质量保证分析师与测试员计算机与数学类职业15-1253L4

name	audit-temporal-walk
description	Heavyweight history-mining bug audit. Walks the caffeine module's git history chronologically (oldest to HEAD), maintains a forward-tracked issue database, and surfaces concerns introduced by past commits that were never resolved. Catches bugs that snapshot mining cannot — half-fixes invisible from current state, latent+trigger pairs across multi-commit interactions, and partial refactors. Slow (~8-14 hours) and rare-run (every several months or before a major release).
disable-model-invocation	true

Audit: Temporal Walk

When to run

Before a major release, as a final-pass audit
After a long sequence of refactors, to catch half-fixes
Once per several months as a baseline audit
Not for routine pre-commit review (use /review-change for that)

How to run

# Walk (long-running; safe to interrupt — resumable):
python3 .claude/skills/audit-temporal-walk/walker.py

# In tmux/nohup for multi-hour reliability:
nohup python3 .claude/skills/audit-temporal-walk/walker.py \
  > .claude/reports/audit-temporal-walk-<module>/walk.log 2>&1 &

# Process N commits then stop cleanly (useful for chunked runs):
python3 .claude/skills/audit-temporal-walk/walker.py --max-commits 200

# Disable inner-model tool access (faster, less accurate — see Design notes):
python3 .claude/skills/audit-temporal-walk/walker.py --no-tools

# Inspect state without running:
python3 .claude/skills/audit-temporal-walk/walker.py --summary

# After the walk completes, verify surviving issues against HEAD:
python3 .claude/skills/audit-temporal-walk/verify.py --min-confidence med

# Read the verified findings:
cat .claude/reports/audit-temporal-walk-<module>/findings.md

What the walker does

For each substantive commit (skipping doc/style/dep-bump only), the walker:

Checks out the commit into a managed detached worktree under .claude/reports/audit-temporal-walk-<module>/worktree/
Invokes claude -p with cwd=worktree and --tools "Read,Glob,Grep", so the inner model can verify hypotheses against the codebase at that commit's state, not HEAD
Shows the commit's diff (scoped to the configured module) and the currently-open tracked issues whose files this commit touches

What the verifier does

Output is a detail.dev-format markdown report with full commit lineage already attached to each finding.

Output

state.json — walker's issue database
verified.json — per-issue verdicts
findings.md — detail.dev-format report
worktree/ — managed detached worktree used for per-commit snapshots (deleting it is safe; the next walk re-creates it)
log/<sha>.raw.json — per-commit raw responses
verify-log/<id>.raw.json — per-issue verifier responses

What to do with a finding

For each still_exists finding in findings.md:

Read the lineage to understand why the bug exists
Cross-check against .claude/docs/design-decisions.md and ~/.claude/projects/-Users-ben-projects-caffeine/memory/cross_model_audit_results.md
Write a failing test that exposes the bug witness
If the test confirms, fix and commit. If the test passes (false positive), record the case in cross_model_audit_results.md so future audits don't re-raise it.

Design notes

Forward-tracked, not snapshot-mined. Catches half-fixes and latent+trigger pairs invisible from current state. See README.md for the design rationale and how this differs from /audit-* snapshot-style audits.
Resumable. State is persisted after every commit. Quota exhaustion or interruption leaves the next-commit pointer at the last successful commit; re-running picks up from there.
Tools scoped to the commit snapshot. The inner claude -p runs with cwd set to a detached worktree checked out at the commit being analyzed, and tools restricted to Read,Glob,Grep. This lets the model verify hypotheses against surrounding code (callers, sibling implementations, full method bodies outside the diff hunk) without seeing HEAD code from future commits — which would collapse the forward-tracking premise (every "issue" would look already fixed by some later commit). --no-tools falls back to diff-only analysis. --disable-slash-commands is always on.
Self-grounding. The verifier prompt requires that quoted code be copied verbatim from the shown HEAD code; verdicts that reference symbols not present in HEAD must return implicitly_resolved. This was load-bearing in early validation: the first verifier run hallucinated a finding citing a nonexistent file, fixed by hardening the grounding rules.

audit-temporal-walk

Audit: Temporal Walk

When to run

How to run

What the walker does

What the verifier does

Output

What to do with a finding

Design notes

同仓库更多 Skills

同仓库更多 Skills

Audit: Temporal Walk

When to run

How to run

What the walker does

What the verifier does

Output

What to do with a finding

Design notes