| name | build-repo-context |
| description | Crawl repository PRs, issues, and review comments to distill institutional knowledge into a shared knowledge base. Run periodically by "context agents" to maintain agent_artefacts/repo_context/REPO_CONTEXT.md. Trigger only on specific request. |
Build Repo Context
Crawl GitHub history (PRs, issues, review comments) and distill institutional knowledge into agent_artefacts/repo_context/REPO_CONTEXT.md. This document helps worker agents understand repo conventions, common mistakes, and known tech debt before making changes.
Workflow
1. Setup
- Create
agent_artefacts/repo_context/ if it doesn't exist
- Read existing
agent_artefacts/repo_context/REPO_CONTEXT.md if present (will be updated, not replaced)
2. Identify What's New
Use the header of REPO_CONTEXT.md to determine what to process. The header contains the last-updated date and PR range (e.g., PRs processed: #965-#1050).
- First run (no
REPO_CONTEXT.md): Fetch the most recent 50 merged PRs + all open issues
- Incremental runs: Fetch PRs merged after the highest PR number in the header, and issues updated since the last-updated date
Use the gh CLI to list candidates:
gh pr list --state merged --limit 50 --json number,title,labels,additions,deletions,reviewDecision,mergedAt
gh pr list --state merged --search "merged:>YYYY-MM-DD" --limit 50 --json number,title,labels,additions,deletions,reviewDecision,mergedAt
gh issue list --state open --limit 100 --json number,title,labels,createdAt,updatedAt
3. Triage
Fast pass over PR titles and metadata. Skip these categories (they rarely contain design insights):
- Dependency bumps (titles matching
bump, update dependencies, renovate, dependabot)
- Changelog-only updates (titles matching
changelog, scriv)
- Bot-generated PRs with no review comments
- PRs with fewer than 5 lines changed and no review comments
Prioritize PRs that have:
- Review comments (especially multiple rounds ā that's where design discussion lives)
- Changes touching shared utilities (
src/inspect_evals/utils/, CONTRIBUTING.md, BEST_PRACTICES.md, AGENTS.md)
Cap at 50 PRs per run to keep execution time reasonable.
4. Extract
For each selected PR, fetch:
gh pr view <N> --json body,title,labels,files,reviewDecision,comments,reviews
gh api repos/{owner}/{repo}/pulls/<N>/comments --paginate
gh api repos/{owner}/{repo}/issues/<N>/comments --paginate
For open issues, fetch body and comments similarly.
Link traversal: If a comment references another PR/issue (e.g., "see #123" or "fixed in #456"), continue to crawl recursively up to 3 hops in total. Do not recurse to an existing PR/issue in the chain to prevent loops.
5. Distill
This is the core intellectual work. For each PR/issue, extract actionable insights in these categories:
- Design decisions: What architectural choice was made and why? What alternatives were rejected?
- Reviewer corrections: What mistakes did reviewers catch? These reveal common pitfalls.
- Established conventions: What patterns were deliberately chosen that future contributors should follow?
- Tech debt acknowledged: What shortcuts were taken intentionally? What should NOT be "fixed" without discussion?
- Common agent mistakes: If review comments mention agent-generated code issues, capture the pattern.
Quality requirements for each insight:
- Must cite source PR/issue number (e.g., "Per PR #973...")
- Must be actionable ("Do X" / "Don't do Y"), not descriptive ("PR #123 added X")
- Must add nuance beyond what CONTRIBUTING.md and BEST_PRACTICES.md already state
- Must be relevant to future contributors, not just historically interesting
- Must be broadly applicable beyond a single issue or evaluation. If the context is excessively narrow, leave it out.
- Must reflect team convention, not a single maintainer's code style or proposal. If in doubt, leave it out.
Skip:
- Bot comments (dependabot, renovate, CI status checks)
- Feature announcements without design implications
- Trivial PRs (typo fixes, version bumps) unless they reveal a convention
- Duplicate insights already captured in REPO_CONTEXT.md
6. Merge Into REPO_CONTEXT.md
Integrate new insights into the existing document structure. Do not just append ā place each insight in the appropriate section and deduplicate:
- If a new insight updates or supersedes an existing one, replace it
- If a section is getting too long, distill further (combine related insights)
- Update the header metadata (last updated date, PR watermark)
- Keep total document size between 500-1000 lines (aggressive distillation if over)
Each insight appears in exactly one section ā do not repeat the same rule across multiple sections with different framing (see step 7).
7. Deduplicate & Consolidate
After merging, review the full document for cross-section duplication. This is critical ā incremental runs naturally introduce duplication because the same convention surfaces in multiple PR reviews (e.g., "use @pytest.mark.docker" might appear as a reviewer correction, an established convention, AND a testing recipe).
Process:
- For each insight, search the entire document for overlapping content. Look for insights that cover the same topic even if phrased differently.
- Keep each insight in exactly one location ā the most specific section that fits. Prefer this priority:
- "Rules & Conventions" for mandatory practices ("always do X", "never do Y")
- "Testing Recipes" for detailed how-to patterns (mock setup, test structure)
- "Known Tech Debt" for acknowledged issues that should not be fixed without discussion
- "CI/Tooling" for build/CI/tooling specifics
- "Open Issues" for bugs and design direction
- Remove the duplicate occurrences, keeping the most complete/specific version.
- Combine related insights that are split across bullets into a single, richer bullet.
Common duplication patterns to watch for:
- The same pytest marker rule appearing in both "Rules" and "Testing Recipes"
- Reviewer corrections that duplicate established conventions (merge into the convention)
- Agent mistakes that are just the inverse of an established convention (keep only the convention)
- API usage patterns appearing in both rules and recipes (keep the rule brief, detail in recipes)
Bounding Rules
| Rule | Limit |
|---|
| First run scope | Most recent 50 merged PRs + all open issues |
| Incremental run scope | New items since last crawl |
| Max PRs per run | 50 |
| Link traversal depth | 3 hops |
| Target REPO_CONTEXT.md size | 500-1000 lines |
| Max issues per run | 100 |
Insight Quality Guidelines
These are critical ā the value of REPO_CONTEXT.md depends on insight quality:
- Every insight must cite its source PR or issue number. It is acceptable to cite multiple sources for the same insight.
- Insights must be actionable: "Do X" / "Don't do Y", not "PR #123 added X"
- Don't duplicate existing docs: Only add nuance that CONTRIBUTING.md and BEST_PRACTICES.md miss
- Skip noise: Bot comments, feature announcements without design implications, trivial PRs
- Focus on: Reviewer corrections, design trade-offs, rejected alternatives, acknowledged tech debt, common agent mistakes
- Be specific: "Use
hf_dataset() wrapper instead of raw load_dataset() for HuggingFace datasets (PR #842)" is better than "Use the right dataset loading function"
- Date-stamp volatile insights: If an insight might become stale (e.g., "Currently X is broken"), include the date so agents can verify
Expected Output
After running this workflow:
agent_artefacts/repo_context/
āāā REPO_CONTEXT.md # Distilled institutional knowledge (committed)
Verification Checklist
After each run, verify:
REPO_CONTEXT.md exists and has well-structured content
- Insights cite source PR/issue numbers
- Insights are actionable, not merely descriptive
- No duplicate insights across sections ā search for key terms (e.g.,
sample ID, get_model, @pytest.mark) and confirm each appears in exactly one place
- Document stays under ~1000 lines
- Header metadata (date, PR range) is updated
- Incremental runs don't reprocess already-crawled PRs