ワンクリックで
repo-xray
// AST-based Python codebase analysis. Use for exploring architecture, extracting interfaces, mapping dependencies, or generating onboarding documentation.
// AST-based Python codebase analysis. Use for exploring architecture, extracting interfaces, mapping dependencies, or generating onboarding documentation.
Exhaustive LLM-powered codebase investigation for optimal AI agent onboarding. Builds on X-Ray signals with unlimited investigation budget to produce the highest-quality onboarding document possible.
Quality assurance for AI onboarding documentation. Analyzes ONBOARD documents against X-Ray outputs and actual code to identify gaps, verify claims, and suggest improvements.
| name | repo-xray |
| description | AST-based Python codebase analysis. Use for exploring architecture, extracting interfaces, mapping dependencies, or generating onboarding documentation. |
X-Ray scanner for Python codebases. Extracts 37+ signals across structure, behavior, and history to solve the cold start problem for AI coding assistants.
Codebase: 2,000,000 tokens
Context Window: 200,000 tokens
Gap: 10x
AI cannot read the whole codebase. It needs an intelligent map.
X-Ray produces two outputs:
| Output | Size | Purpose |
|---|---|---|
| Markdown | ~8-15K | Curated summary — read this first |
| JSON | ~30-50K | Complete reference — query as needed |
Together, they compress a multi-million token codebase into actionable intelligence.
# Full analysis with both outputs
python xray.py /path/to/project --output both
# This creates:
# output/<repo-name>/xray.md — Curated summary for orientation
# output/<repo-name>/data/xray.json — Complete data for reference
python xray.py . --preset minimal # ~2K tokens — quick survey
python xray.py . --preset standard # ~8K tokens — balanced
python xray.py . --preset full # ~15K tokens — comprehensive (default)
# Output format
--output markdown # Markdown only
--output json # JSON only
--output both # Both formats (recommended for agent)
# Output location
--out ./analysis # Write to analysis.md and/or analysis.json (overrides default output/<repo>/ layout)
# Disable sections
--no-logic-maps # Skip complex function analysis
--no-hazards # Skip large file warnings
--no-git # Skip git history analysis
# Other
--verbose # Show progress
--help # All options
The repo_xray agent uses this skill in four phases:
python xray.py . --output both
Agent reads markdown summary for quick orientation. Builds mental model of codebase shape.
Pass 1: Signal Verification
Pass 2: Gap Discovery
Agent produces curated onboarding document using template. Not a dump — intelligent analysis with judgment.
Required sections:
Agent tests its own output before delivering.
| Preset | Pillars | Hotspots | Side Effects | Target Output |
|---|---|---|---|---|
quick | Top 3 (skeleton only) | None | None | ~5K tokens |
standard | Top 5 (100 lines each) | Top 3 | Critical only | ~15K tokens |
thorough | Top 10 (full read small, 200 lines large) | Top 5 | All | ~25K tokens |
Usage:
@repo_xray analyze # standard depth (default)
@repo_xray analyze --depth thorough # deep investigation
@repo_xray survey # quick depth
All agent insights must be marked with confidence:
| Level | Meaning | Example |
|---|---|---|
[VERIFIED] | Read actual code and confirmed | "[VERIFIED] Retry uses exponential backoff (2s, 4s, 8s)" |
[INFERRED] | Logical deduction from related code | "[INFERRED] Cache invalidates on config reload" |
[X-RAY SIGNAL] | Directly from X-Ray, not independently verified | "[X-RAY SIGNAL] CC=67 for main()" |
Agent must report these metrics with every onboarding document:
| Metric | Target |
|---|---|
| Pillars investigated | ≥50% of top 10 |
| Hotspots with verdicts | ≥3 |
| [VERIFIED] insights | ≥10 |
| Gotchas documented | ≥3 |
| Error paths documented | ≥2 |
| Compression ratio | ≥50:1 |
Agent auto-detects domain and adjusts investigation:
| Domain | Indicators | Extra Investigation |
|---|---|---|
| Web API | FastAPI, Flask, Django, routes/ | Auth, rate limiting, validation |
| ML/AI | torch, tensorflow, models/, training | Training loop, inference pipeline |
| Scientific | hypothesis, experiment, research | Validation logic, rigor scoring |
| CLI Tool | argparse, click, typer, commands/ | Command structure, config loading |
| Data Pipeline | airflow, dagster, etl, pipeline | DAG structure, idempotency |
# Codebase Analysis: project-name
## Summary
[File counts, lines, tokens, type coverage]
## Architecture
[Mermaid diagram showing layers and connections]
## Architectural Pillars
[Top 10 most important files, ranked]
## Complexity Hotspots
[Functions with highest cyclomatic complexity]
## Critical Classes
[Key class skeletons with signatures]
## Logic Maps
[Control flow analysis for complex functions]
## Side Effects
[I/O operations by category]
## Hazards
[Large files to avoid reading]
## Entry Points
[CLI commands, main functions]
## Environment Variables
[Required and optional env vars]
[...additional sections based on preset...]
{
"metadata": {
"generated_at": "...",
"preset": "standard",
"file_count": 247,
"total_tokens": 890000
},
"summary": { ... },
"structure": {
"files": { "path": { "lines": N, "tokens": N, "classes": [], "functions": [] } }
},
"imports": {
"graph": { ... },
"layers": { "orchestration": [], "core": [], "foundation": [] },
"circular": [],
"distance": { ... }
},
"complexity": {
"hotspots": [ { "function": "", "file": "", "cc": N } ]
},
"git": {
"risk": [],
"coupling": [],
"freshness": { "active": [], "aging": [], "stale": [], "dormant": [] }
},
"side_effects": {
"by_type": { "db": [], "api": [], "file": [], "subprocess": [] }
},
"hazards": [],
"entry_points": [],
"environment_variables": [],
...
}
| Preset | Markdown | JSON | Total |
|---|---|---|---|
| minimal | ~2K | ~10K | ~12K |
| standard | ~8K | ~30K | ~38K |
| full | ~15K | ~50K | ~65K |
| Phase | Tokens | Purpose |
|---|---|---|
| ORIENT | ~15K | Read X-Ray markdown |
| INVESTIGATE | ~15-25K | Selective file reads, JSON queries |
| SYNTHESIZE | ~15K | Output document |
| VALIDATE | ~1K | Self-test |
| Total | ~45-55K | Full analysis workflow |
Note: Agent reads markdown (~8-15K), queries JSON selectively (~5K). Total agent consumption is typically 15-25K, not the full JSON size.
For codebases >500 files:
# 1. Quick survey first
python xray.py . --preset minimal
# 2. Focus on specific areas
python xray.py ./src/core --preset full
python xray.py ./src/api --preset full
# 3. Or use full scan with selective --no-X flags
python xray.py . --no-logic-maps --no-test-example
1. Full X-Ray scan (get all signals)
2. Investigate ONLY:
- Top 10 pillars (not all)
- Hotspots with CC > 20 (not CC > 10)
- Critical path side effects only
3. Document gaps explicitly
| Codebase Size | Files | Strategy | Investigation Depth |
|---|---|---|---|
| Small | <100 | Full | Everything |
| Medium | 100-500 | Full | Top signals |
| Large | 500-2000 | Prioritized | Top 10 each category |
| Very Large | >2000 | Divide & Conquer | Subsystem focus |
Place .xray.json in project root for automatic detection:
{
"sections": {
"logic_maps": { "enabled": true, "count": 5 },
"critical_classes": { "enabled": true, "count": 10 },
"hazards": true,
"git": true
}
}
python xray.py --init-config > .xray.json
repo-xray/
├── xray.py # Main entry point
├── lib/
│ ├── ast_analysis.py # Skeleton, complexity, types
│ ├── import_analysis.py # Dependencies, layers, distance
│ ├── call_analysis.py # Cross-module calls
│ ├── git_analysis.py # Risk, coupling, freshness
│ ├── gap_features.py # Logic maps, hazards, models
│ └── ...
├── formatters/
│ ├── markdown_formatter.py
│ └── json_formatter.py
├── configs/
│ └── presets.json
└── .claude/
├── agents/
│ └── repo_xray.md # Agent definition
└── skills/
└── repo-xray/
├── SKILL.md # This file
├── COMMANDS.md # Quick reference
└── templates/
└── ONBOARD.md.template