| name | document-hub-analyze |
| description | Deep analysis of codebase vs documentation alignment. Detects drift, identifies undocumented code, extracts missing glossary terms, and provides actionable recommendations without making changes.. Use when Codex should run the converted document-hub-analyze workflow. |
Document Hub Analyze
Converted Claude skill workflow for Codex/OpenAI use.
Source
Converted from skills/document-hub-analyze/SKILL.md.
Converted Instructions
The content below was adapted from the Claude source. Rewrite tool and runtime assumptions as needed when they refer to Claude-only features.
Document Hub: Analyze
Analyze documentation quality and detect drift from actual codebase.
Helper Scripts Available:
scripts/detect_drift.py - Comprehensive drift detection
scripts/extract_glossary.py - Find undocumented domain terms
scripts/validate_hub.py - Structure validation
Use this skill to diagnose documentation issues without making changes. This is a read-only analysis that reports problems and recommendations.
What This Skill Does
Performs deep analysis to answer:
- What code exists but isn't documented?
- What documentation exists for removed code?
- What domain terms are missing from glossary?
- How healthy is the documentation overall?
Decision Tree: When to Use This Skill
User wants to check documentation → Run analysis:
1. Validate structure (errors block analysis)
2. Detect module drift (src/ vs keyPairResponsibility.md)
3. Detect technology drift (package.json vs techStack.md)
4. Extract missing glossary terms
5. Calculate overall health score
6. Generate recommendations
→ Present findings without making changes
→ Suggest /document-hub update if drift detected
Analysis Workflow
Phase 1: Validation Check
Start with structure validation:
python scripts/validate_hub.py /path/to/project
If validation fails:
- Report structural errors
- Recommend fixing before analyzing content
- Exit early if hub doesn't exist
Phase 2: Module Drift Analysis
Detect undocumented modules:
python scripts/detect_drift.py /path/to/project
This returns:
{
"drift_score": 0.23,
"module_drift": {
"undocumented": ["analytics", "webhooks"],
"documented_but_missing": ["legacy"]
}
}
Interpret results:
undocumented → Code exists but not in docs
documented_but_missing → Docs reference non-existent code
Phase 3: Technology Drift Analysis
From the same detect_drift.py output:
{
"technology_drift": {
"undocumented": ["Redis", "BullMQ"],
"documented_but_missing": ["MongoDB"]
}
}
Interpret results:
- Check
package.json/requirements.txt for discrepancies
- Identify when tech was added (git log)
- Determine if tech is actually in use
Phase 4: Glossary Gap Analysis
Find missing domain terms:
python scripts/extract_glossary.py /path/to/project
Returns ranked terms from codebase. Compare with current glossary.md:
- Read existing glossary
- Extract current terms
- Identify terms in code but not in glossary
- Rank by importance (score from script)
- Recommend top 10-20 additions
Phase 5: Health Scoring
Calculate overall documentation health:
Scoring Formula:
Health Score = 100 - (drift_score * 100)
Interpretation:
- 90-100: Excellent (drift < 0.10)
- 75-89: Good (drift 0.10-0.25)
- 60-74: Needs Attention (drift 0.25-0.40)
- < 60: Poor (drift > 0.40)
Phase 6: Generate Recommendations
Produce actionable recommendations prioritized by impact:
HIGH PRIORITY:
- Undocumented modules with many files
- Missing critical technologies (databases, frameworks)
- Broken cross-references
MEDIUM PRIORITY:
- Documented-but-missing code
- Complex diagrams needing split
- Missing glossary terms for core concepts
LOW PRIORITY:
- Minor tech stack updates
- Formatting inconsistencies
- Additional glossary terms
Analysis Report Format
Present findings in a structured report:
Documentation Hub Analysis Report
==================================
Overall Health: 77/100 (Good)
Drift Score: 0.23
STRUCTURE VALIDATION
--------------------
✓ All required files present
✓ Mermaid syntax valid
⚠ 1 warning: systemArchitecture.md diagram complex (22 nodes)
MODULE DRIFT ANALYSIS
---------------------
Drift Score: 0.30 (Medium)
Undocumented Modules (2):
• src/analytics - 8 files, appears to be user analytics tracking
• src/webhooks - 4 files, webhook event handling
Documented but Missing (1):
• src/legacy - Referenced in docs but directory doesn't exist
Last seen: commit abc123 (3 months ago)
TECHNOLOGY DRIFT ANALYSIS
--------------------------
Drift Score: 0.15 (Low)
Missing from techStack.md (2):
• Redis - Found in package.json, added 2 weeks ago
• BullMQ - Found in package.json, used for job queues
Documented but Not Found (1):
• MongoDB - Still in techStack.md but removed from dependencies
GLOSSARY GAPS
-------------
Found 18 potential domain-specific terms not in glossary:
High Relevance:
1. BatchProcessor (score: 45) - "Processes items in configurable batch sizes"
2. FulfillmentQueue (score: 42) - "Queue for order fulfillment jobs"
3. CIPIntegration (score: 38) - "Customer Information Portal integration"
Medium Relevance:
[Additional terms...]
RECOMMENDATIONS
---------------
HIGH PRIORITY (Do these first):
1. Document analytics module in keyPairResponsibility.md
2. Document webhooks module in keyPairResponsibility.md
3. Add Redis to techStack.md
4. Remove legacy module reference from docs
MEDIUM PRIORITY (Do when possible):
5. Add BullMQ to techStack.md
6. Add top 10 glossary terms
7. Consider splitting systemArchitecture.md diagram
LOW PRIORITY (Nice to have):
8. Add remaining glossary terms
9. Update MongoDB reference or re-add dependency
NEXT STEPS
----------
Run /document-hub update to apply these recommendations automatically.
Example: Complete Analysis
import json
import subprocess
from pathlib import Path
project_path = Path("/path/to/project")
validate = subprocess.run(
["python", "scripts/validate_hub.py", str(project_path)],
capture_output=True, text=True
)
validation = json.loads(validate.stdout)
drift = subprocess.run(
["python", "scripts/detect_drift.py", str(project_path)],
capture_output=True, text=True
)
drift_data = json.loads(drift.stdout)
glossary = subprocess.run(
["python", "scripts/extract_glossary.py", str(project_path)],
capture_output=True, text=True
)
glossary_data = json.loads(glossary.stdout)
glossary_file = project_path / "documentation hub directory" / "glossary.md"
existing_terms = set()
if glossary_file.exists():
with open(glossary_file) as f:
pass
missing_terms = [
t for t in glossary_data["terms"]
if t["term"] not in existing_terms
]
health_score = 100 - (drift_data["drift_score"] * 100)
print("Documentation Hub Analysis Report")
print("=" * 50)
print(f"Overall Health: {health_score:.0f}/100")
print(f"Drift Score: {drift_data['drift_score']:.2f}")
print()
print("UNDOCUMENTED MODULES:")
for module in drift_data["module_drift"]["undocumented"]:
print(f" • {module}")
print()
print("GLOSSARY GAPS:")
for term in missing_terms[:10]:
print(f" • {term['term']} (score: {term['score']})")
Use Cases
Pre-Update Health Check
Before running /document-hub update:
→ Run /document-hub analyze
→ Understand scope of needed changes
→ Prioritize what to update
Periodic Documentation Audit
Monthly or quarterly:
→ Run /document-hub analyze
→ Track health score trend
→ Address high-priority items
Onboarding Documentation Validation
When taking over a project:
→ Run /document-hub analyze
→ Understand documentation completeness
→ Identify knowledge gaps
Best Practices
- Run before updating - Understand scope before making changes
- Track drift score - Monitor documentation health over time
- Prioritize recommendations - Don't try to fix everything at once
- Use as diagnostic - This is read-only analysis, not updates
Common Pitfalls
❌ Don't make changes during analysis - this is read-only
❌ Don't ignore high-priority items - they indicate real gaps
❌ Don't run too frequently - analysis is for planning, not continuous monitoring
✅ Do use findings to guide updates
✅ Do prioritize high-impact recommendations
✅ Do track health score trends
✅ Do run before major updates
Interpretation Guide
Drift Scores
- < 0.10 - Excellent: Minor gaps only
- 0.10-0.25 - Good: Some undocumented items, easy to fix
- 0.25-0.40 - Needs Attention: Significant gaps exist
- > 0.40 - Poor: Major documentation debt
When to Act
Immediate action needed:
- Drift score > 0.35
- Multiple undocumented core modules
- Broken cross-references
Plan for next sprint:
- Drift score 0.20-0.35
- Some missing technologies
- Glossary gaps
Low priority:
- Drift score < 0.20
- Minor formatting issues
- Optional glossary terms
What Comes Next
After analysis:
- If high drift → Run
/document-hub update immediately
- If medium drift → Schedule update for next sprint
- If low drift → Continue monitoring, update as needed
Helper Script Reference
detect_drift.py - Module and technology drift
python scripts/detect_drift.py /path/to/project
extract_glossary.py - Find missing terms
python scripts/extract_glossary.py /path/to/project
validate_hub.py - Structure validation
python scripts/validate_hub.py /path/to/project
See scripts/README.md for complete documentation.