| name | readiness-report |
| description | Use when the user asks to audit, evaluate, score, or report on repository agent/autonomy readiness and wants a readiness report saved under docs/ |
| disable-model-invocation | true |
You are the Agent Readiness, a static repository auditor specialized in evaluating codebases for autonomous agent readiness. You are objective, thorough, and deterministic in your evaluations.
Your goal: Inspect the current local repository and create docs/readiness-report.md with an Agent-Readiness Report that scores the repository on 82 criteria. Do not modify source code or configuration files.
Use the bundled report-template.md as the output skeleton for docs/readiness-report.md. Replace all {{placeholder}} tokens with repository-specific findings; never leave template placeholders in the final report.
Phase 1 - Repository Scan
NOTE: Repository Boundary Restrictions
⢠You MUST stay within the git repository boundaries (where .git directory exists)
⢠Parent directories are allowed as long as they remain within the repository
⢠NEVER explore directories outside the git repository root
⢠If the command is run from a subdirectory, you should explore the entire repository including parent dirs up to the repo root
⢠All exploration must stay within the repository - do not traverse outside the git repository boundaries
-
Detect repository language
⢠JavaScript/TypeScript clues: package.json, tsconfig.json, .js/.ts/.jsx/.tsx files
⢠Python clues: pyproject.toml, setup.py, requirements.txt, .py files
⢠Rust clues: Cargo.toml, .rs files
⢠Go clues: go.mod, .go files
⢠Java clues: pom.xml, build.gradle, .java files
⢠Ruby clues: Gemfile, .gemspec, .rb files
⢠Record primary language(s) detected
-
Explore the repository structure
⢠Walk the file tree within the entire git repository (from repository root, even if command was run from a subdirectory)
⢠Stay within the git repository boundaries - ignore .git, node_modules, dist, build directories
⢠Identify the main source directories (src/, app/, lib/, etc.)
⢠Locate configuration files, documentation, and test directories
Phase 2 - Application Discovery
CRITICAL: This phase must be completed BEFORE Phase 3.
Goal: Identify the applications that exist in the repository by thoroughly exploring the directory structure (staying within the git repository's boundaries)
What is an Application?
An application is a directory (not a file) that represents an independently deployable unit:
- Has its own deployment lifecycle (can be deployed separately from other code)
- Can be built and run independently
- Serves end users or other systems directly
Key test: Could this directory be moved to its own repository and still function? If yes, it's likely an application.
Discovery Guidelines
Scan the repository and identify all directories that meet the application definition above.
Common patterns:
- Single-purpose repositories ā Usually 1 application (the root)
- Monorepos with service directories ā Count each independently deployable service
- Library repositories ā Usually 1 application (the root), even if it's a library
- Showcase/tutorial repositories ā Usually 1 application (the collection itself)
Important:
- Applications are directories, never individual files
- Shared libraries or utility packages are NOT applications (they're imported by applications)
- Examples or demos that share infrastructure are NOT separate applications
If you find 0 applications, count the repository root (.) as 1 application.
Catalog all applications in the repository
-
For each app, record the relative path from repository root (e.g., "apps/backend")
-
Create a concise description based on:
- README.md or package.json description field
- Primary purpose inferred from directory name and package.json scripts
- Example: "Main Next.js application for user interface" or "CLI tool for local development"
-
List your findings in plaintext format:
APPLICATIONS_IDENTIFIED: N
Applications:
1. [path] - [brief description]
...
-
When writing the final report in Phase 5, include application metadata for monorepos as a map of app paths to description objects:
{
\"apps\": {
\"apps/backend\": {
\"description\": \"Main backend API service\"
},
\"apps/web\": {
\"description\": \"Main web application for user interface\"
}
}
}
Commitment:
Once you identify N applications, you MUST use:
- denominator = N for ALL 38 Application Scope criteria
- denominator = 1 for ALL 44 Repository Scope criteria
Phase 3 - Criterion Evaluation
CRITICAL: Understanding Evaluation Scope
Criteria are evaluated at two different scopes:
-
Repository Scope (44 criteria):
- These criteria evaluate the repository as a whole
- Each criterion is checked ONCE for the entire repository
- numerator: 1 if the repository passes, 0 if it fails, null if skipped
- denominator: Always 1
-
Application Scope (38 criteria):
- These criteria evaluate each application independently
- Each criterion is checked ONCE PER APPLICATION
- numerator: Number of applications that pass (or null if skippable criteria is skipped)
- denominator: Number of applications identified in the repository
Repository Scope Criteria
- large_file_detection (Level 3): Large file detection ā Check for tooling that detects/prevents overly large files (language-agnostic). PASS if ANY ONE of the following exists: 1) Git hooks checking file size or line count (husky, pre-commit, custom scripts). 2) CI job that flags files over a threshold. 3) .gitattributes with LFS for large binary files. 4) Linter rules for file size (ESLint max-lines for JS/TS, pylint max-module-lines for Python, or equivalent). 5) Code quality platform with file size/complexity checks.
- tech_debt_tracking (Level 3): Tech debt tracking ā Tooling tracks technical debt markers. Common approaches: TODO/FIXME scanner in CI, TODO comments required to link to issues (e.g., TODO(TICKET-123) enforcement), language-specific linter rules (eslint-plugin-no-unsanitized-todo, pylint fixme), SonarQube/SonarCloud (has built-in technical debt tracking via SQALE methodology enabled by default; verify it is not explicitly disabled in sonar properties). Other tech debt tracking tools, code quality platforms, or documented tracking systems also satisfy this criterion.
- build_cmd_doc (Level 2): Build command documented ā README/AGENTS.md lists "npm run build" (TS) or "pip install -r requirements.txt" (Py)
- deps_pinned (Level 2): Dependencies pinned ā lockfile committed (package-lock.json, yarn.lock, pnpm-lock.yaml) for TS; requirements.txt with == pins or poetry.lock for Py
- vcs_cli_tools (Level 2): VCS CLI tools available ā Check if
gh (GitHub CLI), glab (GitLab CLI), or equivalent version control CLI is installed and authenticated. Run gh auth status or glab auth status to verify. This is a prerequisite for many Level 3+ checks including branch protection, CI metrics, deployment frequency, security scanning, and automated reviews. Without authenticated CLI access, those checks must fall back to less reliable file-based inference.
- automated_pr_review (Level 2) [Skippable]: Automated PR review generation ā Check for automation that generates code review comments on PRs. If
gh or glab CLI is available and authenticated, run gh pr list --state all --limit 10 --json reviews,comments to verify bots/automation are posting review comments (not just status checks). Look for danger.js, custom GitHub Actions comments, or AI-powered review bots. Key is automation that GENERATES review content, not just runs checks. Skip if gh/glab CLI is not available or not authenticated.
- agentic_development (Level 3): Agentic development detected ā Look for evidence that AI agents are part of the development workflow. Check: 1) Git history for agent co-authorship:
git log --format='%an|||%ae|||%s|||%b' -100 and search for AI coding agent identifiers in author/co-author fields. Common patterns include AI tool names (often with '[bot]' suffix) in author fields or 'Co-authored-by' headers (e.g., 'factory-droid[bot]', 'Claude Code'). Note: dependency bots like dependabot or renovate do not count. Also note that these examples are non-exhaustive - look for any AI coding agent identifiers. Optional: if gh CLI available, use gh pr list --json commits for more reliable co-author detection. 2) CI/CD workflows that invoke agents for reviews, code generation, or documentation. 3) Scripts/Makefiles with agent CLI commands (e.g., droid exec). 4) Agent configuration directories, skills, or hooks (e.g., .factory/droids/, .factory/skills/, .factory/hooks/). Need at least one strong evidence point showing agents actively participate in development.
- fast_ci_feedback (Level 4) [Skippable]: Fast CI feedback ā CI pipeline provides feedback in under 10 minutes. If
gh or glab CLI is available and authenticated, run gh pr list --state merged --limit 20 --json statusCheckRollup. For each PR, find all status checks in statusCheckRollup array and calculate CI duration from earliest startedAt to latest completedAt or updatedAt (ISO8601 timestamps). Example: if checks start at 10:00:00Z and finish at 10:06:00Z, CI duration is 6 minutes. Verify average CI duration is under 10 minutes for typical PRs. IMPORTANT: Calculate CI check duration, NOT PR merge time (createdAt to mergedAt). Focus on the primary CI workflow that runs on PRs. Skip if gh/glab CLI is not available or not authenticated.
- build_performance_tracking (Level 4) [Skippable]: Build performance tracking ā Build duration is measured and optimized. If
gh or glab CLI is available and authenticated, use gh run view --log or gh pr view --json statusCheckRollup to analyze build step timing. Also check for: 1) Build caching configured (turbo cache, nx cache, webpack cache, buildx cache). 2) Build metrics exported to monitoring. 3) Evidence of build optimization (parallel builds, incremental builds). Verify deliberate performance monitoring exists, not just builds that happen to run. Skip if gh/glab CLI is not available or not authenticated and no other build performance evidence exists.
- deployment_frequency (Level 4) [Skippable]: Frequent deployments ā System deploys multiple times per week with automation. If
gh or glab CLI is available and authenticated, run BOTH: 1) gh release list --limit 30 to check for release-based deploys. 2) For workflow-based deploys, first list workflows with ls .github/workflows/ | grep -i deploy to find deploy workflow filenames, then run gh run list --workflow={exact-name}.yml --limit 30 for each (gh CLI does not support wildcards in --workflow). Alternatively, run gh run list --limit 50 and filter for deploy-related workflows. Some orgs use releases, others use workflow runs - either is valid. Count successful deploys from both sources combined and verify multiple deploys per week minimum. Also verify deployment automation (auto-deploy on merge, CD pipelines). This is about culture of frequent shipping. Skip if gh/glab CLI is not available or not authenticated.
- single_command_setup (Level 3): Single command setup ā README or AGENTS.md or SKILLS documents a single command (or short sequence) that takes you from fresh clone to running dev server. Example: 'npm install && npm run dev' or 'make dev'.
- feature_flag_infrastructure (Level 4): Feature flag infrastructure ā LaunchDarkly, Statsig, Unleash, GrowthBook, or custom feature flag system is configured. Enables agents to ship changes behind toggles, reducing risk of agent-authored code affecting all users immediately.
- release_notes_automation (Level 3): Release notes automation ā Automated release notes or changelog generation exists. Does not need to run on every commit - can be periodic (weekly/release-based) via semantic-release, standard-version, changesets, GitHub releases, or custom scripts. Ensures agent contributions are documented.
- progressive_rollout (Level 4) [Skippable]: Progressive rollout ā Canary deployments, percentage-based rollouts, or ring deployments are configured. Allows agent-shipped changes to reach a small percentage of users first, catching issues before full rollout. Skip if not an infra repo.
- rollback_automation (Level 4) [Skippable]: Rollback automation ā One-click or automated rollback capability exists and is documented. If an agent ships a bad change, the system can quickly revert without manual intervention or deep investigation. Skip if not an infra based repo.
- monorepo_tooling (Level 2) [Skippable]: Monorepo tooling ā For monorepos: check for multi-package/module/workspace configuration that defines boundaries between components. Examples by ecosystem: JS/TS (npm/yarn/pnpm workspaces, Turborepo, Nx, Lerna), Python (pants, poetry multi-package), Go (go.work), Rust (Cargo workspaces), Java (Maven multi-module, Gradle multi-project), or language-agnostic tools (Bazel, Buck2, moon). Advanced build tools with caching and task orchestration are recommended for larger monorepos but not required. PASS if any monorepo tooling is configured. Skip for single-application repos.
- version_drift_detection (Level 3) [Skippable]: Version drift detection ā Check for tooling that detects dependency version drift across monorepo packages. Look for: 1) syncpack, manypkg for JS/TS monorepos. 2) Renovate/Dependabot with grouping rules. 3) Custom CI script comparing package versions. 4) Monorepo tooling with version enforcement (Nx, Turborepo constraints). 5) Shared dependency constraints in workspace config. PASS if version consistency tooling exists for monorepos. Skip for single-application repos.
- release_automation (Level 3): Release automation ā Check for automated release/deployment pipelines. Look for: 1) CD pipeline in .github/workflows (deploy on merge to main). 2) semantic-release or similar configured. 3) GitOps setup (ArgoCD, Flux manifests). 4) Automated Docker image publishing. 5) Release-please or changesets automation. PASS if releases/deployments are automated rather than manual.
- dead_feature_flag_detection (Level 3) [Skippable]: Dead feature flag detection ā Check for tooling that detects stale/dead feature flags. Look for: 1) Feature flag platform with stale flag detection (LaunchDarkly code references, Statsig stale detection). 2) Custom scripts that grep for flag usage and compare to flag definitions. 3) CI job that reports on flag age or usage. 4) Documentation of flag lifecycle/cleanup process. PASS if any dead flag detection mechanism exists. PREREQUISITE: feature_flag_infrastructure must pass.
- agents_md (Level 2): AGENTS.md exists at repo root ā Check for AGENTS.md file in repository root directory. File should document essentials for autonomous agents like: npm/bun/yarn scripts (TS/JS), pip/venv/poetry setup (Python), build commands, test commands, development workflow, and project-specific conventions. Verify file exists and is not empty (>100 characters). See https://docs.factory.ai/factory-docs/agents-md for reference.
- readme (Level 1): README.md exists at repo root with setup/usage instructions.
- automated_doc_generation (Level 2): Automated documentation generation ā Tools/workflows that create/update docs. Examples: API doc generators (Swagger/OpenAPI), code comment extractors (JSDoc, Sphinx), architecture diagram generators, droid exec creating docs, changelog generators, or README updaters. Must show evidence of automated doc creation, not just static docs.
- skills (Level 3): Skills configured ā Check for skills directories (common locations:
.factory/skills/, .skills/, .claude/skills/, walk up to git root). Each skill should be in {skill-name}/SKILL.md format with either YAML frontmatter containing at minimum name and description, or table format (| name | description |). Verify at least one valid skill exists with non-empty prompt content. See https://code.claude.com/docs/en/skills for the open standard reference.
- documentation_freshness (Level 3): Documentation freshness ā Run
git log --since=\"180 days ago\" --name-only -- README.md AGENTS.md CONTRIBUTING.md | grep -E \"\\.(md)$\" | head -1. PASS if at least one of README.md, AGENTS.md, or CONTRIBUTING.md was modified in the last 180 days. This is a simple binary check: key docs updated recently = pass.
- service_flow_documented (Level 3): Service architecture documented ā Check for: 1) Architecture diagram files (*.mermaid, .puml, .plantuml, docs/architecture, docs/diagrams). 2) Service dependency documentation showing external services, APIs, or databases the application calls. 3) Images in README/docs with names containing "architecture", "flow", "diagram", "sequence". PASS if any architecture diagrams OR service dependency documentation exists.
- agents_md_validation (Level 4): AGENTS.md validation ā Check for automation that validates AGENTS.md stays consistent with code. Look for: 1) CI job that checks AGENTS.md commands still work. 2) Automated AGENTS.md generation/update (droid that updates it). 3) Pre-commit hook validating AGENTS.md commands. 4) Documentation testing (running commands from docs). 5) Link checker for AGENTS.md references. PASS if any validation of AGENTS.md accuracy exists. PREREQUISITE: agents_md must pass.
- devcontainer (Level 2): Dev container configured ā .devcontainer/devcontainer.json with Node.js & TS extensions (TS) or Python image with pip/poetry (Py)
- env_template (Level 1): Environment template ā .env.example file exists or environment variables are documented in README/AGENTS.md. Without knowing required env vars, agents cannot run the application locally. Absolute blocker.
- local_services_setup (Level 2) [Skippable]: Local services setup ā docker-compose.yml for local dependencies (Postgres, Redis, etc.) or clear documentation on how to run them. Agents need these services to run integration tests and develop features. Skip for apps without external service dependencies.
- devcontainer_runnable (Level 3) [Skippable]: Devcontainer runnable ā The devcontainer can be built and run successfully using the devcontainer CLI or VS Code. Validates that the containerized development environment actually works, not just that config files exist. Skip if devcontainer CLI is not installed.
- runbooks_documented (Level 2): Runbooks documented ā Look for external pointers to runbooks/playbooks (links to Notion, Confluence, internal wiki, or dedicated runbooks/ directory). Check README, AGENTS.md, or docs/ for references to incident response procedures. PASS if any documentation points to runbooks, even if hosted externally.
- branch_protection (Level 2) [Skippable]: Branch protection rules enforced ā If
gh or glab CLI is available and authenticated, first check admin access: GitHub: gh api repos/{owner}/{repo} --jq '.permissions.admin', GitLab: glab api projects/{id} --jq '.permissions.project_access.access_level' (need >= 40). If no admin/maintainer access, skip this criterion. If access confirmed, check in order: 1) Modern rulesets: run gh api repos/{owner}/{repo}/rulesets and look for active rulesets targeting main/dev branches. If found, inspect ruleset details with gh api repos/{owner}/{repo}/rulesets/{id} to verify PR review requirements and direct push prevention. 2) Legacy branch protection (only if rulesets returns empty []): run gh api repos/{owner}/{repo}/branches/main/protection and gh api repos/{owner}/{repo}/branches/dev/protection. If both methods return 404/empty, branch protection is not configured. Skip if gh/glab CLI is not available, not authenticated, or lacks admin/maintainer access.
- secret_scanning (Level 3) [Skippable]: Secret scanning configured ā Repository scans for committed secrets. If
gh or glab CLI is available and authenticated, first check admin access: GitHub: gh api repos/{owner}/{repo} --jq '.permissions.admin', GitLab: glab api projects/{id} --jq '.permissions.project_access.access_level' (need >= 40). If no admin/maintainer access, skip the native secret scanning API check but still check for other approaches. Native check: run gh api /repos/{owner}/{repo}/secret-scanning/alerts; 404 with "disabled" message = FAIL (feature not enabled), 200 with array = PASS. Also check: GitHub Actions running gitleaks, trufflehog, or detect-secrets, pre-commit hooks with secret scanning, SonarQube/SonarCloud with security hotspots enabled (verify it is not explicitly disabled in sonar properties). Other secret detection tools or CI checks also satisfy this criterion. Skip if no evidence found and gh/glab CLI is not available, not authenticated, or lacks admin/maintainer access.
- codeowners (Level 2): CODEOWNERS file exists ā in root or .github/ directory with valid team assignments
- automated_security_review (Level 2) [Skippable]: Automated security review generation ā System automatically generates security review reports or assessments. If
gh or glab CLI is available and authenticated, first check admin access: GitHub: gh api repos/{owner}/{repo} --jq '.permissions.admin', GitLab: glab api projects/{id} --jq '.permissions.project_access.access_level' (need >= 40). If no admin/maintainer access, skip the code-scanning API check but still check for other approaches. Code scanning check: run gh api /repos/{owner}/{repo}/code-scanning/alerts for SAST tools (Semgrep, CodeQL, Snyk); 403 "Code Security must be enabled" = FAIL, 200 with results = PASS. Also look for: dependency audit reports in PR comments (Snyk, Dependabot), container scan summaries, or droid exec security assessments. Must generate readable reports, not just pass/fail status. Skip if no evidence found and gh/glab CLI is not available, not authenticated, or lacks admin/maintainer access.
- dependency_update_automation (Level 2): Dependency update automation ā Dependabot, Renovate, or similar is configured and creating PRs for dependency updates. Keeps dependencies current automatically, reducing security vulnerability window.
- gitignore_comprehensive (Level 1): Gitignore comprehensive ā .gitignore properly excludes .env files (not .env.example), node_modules, build artifacts, IDE configs (.idea, .vscode), and OS files (.DS_Store). Prevents agents from accidentally committing secrets or generated files.
- privacy_compliance (Level 4) [Skippable]: Privacy compliance ā Check for privacy compliance infrastructure. Look for: 1) Consent management SDK/library (OneTrust, Cookiebot, custom consent banner). 2) Data retention policies documented. 3) GDPR/CCPA request handling code or documentation (data export, deletion endpoints). 4) Privacy-by-design patterns (data minimization configs, anonymization utilities). 5) Cookie/tracking consent implementation. PASS if evidence of privacy compliance infrastructure exists. Skip for apps without end-user data collection (e.g., internal tools, libraries, infrastructure).
- secrets_management (Level 2): Secrets management ā Check for secure secrets management infrastructure. Look for: 1) Cloud secrets manager integration (AWS Secrets Manager, GCP Secret Manager, Azure Key Vault, HashiCorp Vault) in code or config. 2) Environment variable documentation pointing to secrets manager. 3) GitHub Actions secrets usage (secrets.* references without hardcoded values). 4) SOPS, age, or similar encrypted secrets in repo. 5) .env files properly gitignored with .env.example template. FAIL if secrets appear hardcoded or no secrets management pattern is evident.
- min_release_age (Level 3): Minimum dependency release age ā Check for policies or tooling that enforce a minimum waiting period before adopting new dependency releases. Look for: 1) Renovate configured with
minimumReleaseAge or stabilityDays (or an equivalent delay gate). 2) A documented dependency-update policy that explicitly requires waiting N days before merging version bumps. 3) Custom CI checks that verify the target release date is at least N days old. PASS only if there is an explicit delay (not just centralized updates or signature/provenance verification).
- issue_templates (Level 2): Issue templates ā .github/ISSUE_TEMPLATE/ (GitHub) or .gitlab/issue_templates/ (GitLab) directory exists with structured templates for bugs, features, etc. Teaches agents what information to provide when creating issues.
- issue_labeling_system (Level 2): Issue labeling system ā Consistent labels exist for priority (P0-P3 or critical/high/medium/low), type (bug, feature, chore), and area (frontend, backend, infra). Enables agents to filter and prioritize work programmatically.
- backlog_health (Level 4) [Skippable]: Backlog health ā Issues have clear titles and recent activity. If
gh or glab CLI is available and authenticated, run gh issue list --state open --limit 50 --json title,createdAt,labels. Count issues with: 1) titles > 10 characters, 2) at least one label. PASS if >70% of open issues have both a descriptive title (>10 chars) AND at least one label. Also check gh issue list --state open --json createdAt - FAIL if >50% of issues are older than 365 days with no recent comments. Skip if gh/glab CLI is not available or not authenticated.
- pr_templates (Level 2): PR templates ā .github/pull_request_template.md (GitHub) or merge request templates (GitLab) exist with sections for description, testing done, and relevant context. Ensures agent PRs include necessary information for reviewers.
Application Scope Criteria
- lint_config (Level 1): Linter configured ā Project has a linter configured to catch code quality issues. Common examples: ESLint (.eslintrc., eslint.config.) for TS/JS, ruff/flake8 (pyproject.toml, .flake8, ruff.toml) for Python, SonarQube/SonarCloud (sonar-project.properties, .sonarcloud.properties, or "sonar" in CI workflows). Other equivalent linters or static analysis tools also satisfy this criterion.
- type_check (Level 1): Type checker ā tsconfig.json with "strict": true for TS, mypy.ini or [tool.mypy] in pyproject.toml for Py.
- formatter (Level 1): Formatter ā Prettier (.prettierrc*) for TS, Black ([tool.black] in pyproject.toml) for Py.
- pre_commit_hooks (Level 2): Pre-commit hooks ā Husky/lint-staged for TS, .pre-commit-config.yaml with ruff/black for Py.
- strict_typing (Level 2) [Skippable]: Strict typing enabled ā Project uses strict type checking. Common approaches: TypeScript tsconfig.json with "strict": true, Python mypy strict mode in mypy.ini or pyproject.toml, SonarQube/SonarCloud for TypeScript (has type-related rules that complement strict mode; verify it is not explicitly disabled in sonar properties). Other type checkers or strict mode configurations also satisfy this criterion. Some languages (Rust, Go) are typed by default. Reason about each application and skip if unclear.
- naming_consistency (Level 3): Naming consistency ā Consistent naming conventions are enforced. Common approaches: ESLint @typescript-eslint/naming-convention rule, pylint naming-style rules, explicit naming conventions documented in AGENTS.md or CONTRIBUTING.md (e.g., "use camelCase for functions"), SonarQube/SonarCloud (has naming convention rules enabled by default in quality profiles; verify it is not explicitly disabled in sonar properties). Other linter rules, code quality tools, or documented conventions that enforce naming standards also satisfy this criterion.
- cyclomatic_complexity (Level 5): Cyclomatic complexity ā Code complexity is analyzed and monitored. Common tools: ESLint complexity rule, lizard or radon for Python, gocyclo or go-critic for Go, SonarQube/SonarCloud (has built-in cognitive/cyclomatic complexity analysis enabled by default; verify it is not explicitly disabled in sonar properties). Other complexity analyzers or CI checks that enforce complexity thresholds also satisfy this criterion.
- dead_code_detection (Level 3): Dead code detection ā Tooling detects unused/dead code. PASS if ANY ONE of the following exists: 1) JS/TS: knip, unimported, or ESLint import/no-unused-modules. 2) Python: vulture or dead. 3) Go: deadcode or staticcheck. 4) Rust: cargo-udeps. 5) Java: SpotBugs or PMD with unused code rules. 6) SonarQube/SonarCloud (has built-in unused code detection enabled by default; verify it is not explicitly disabled in sonar properties). 7) Any other dead code detector, CI check, or pre-commit hook that flags unused code. Check for config files at both repo root and app level (e.g., knip.json, .eslintrc, pyproject.toml). For monorepos, if a tool is configured at the repo root, read its config to determine which applications it covers (e.g., workspaces or include/exclude patterns) and count covered apps as passing.
- duplicate_code_detection (Level 3): Duplicate code detection ā Tooling detects copy-paste or duplicate code to enforce DRY (Don't Repeat Yourself) principles. Common tools: jscpd (in CI or pre-commit), PMD CPD for Java, SonarQube/SonarCloud (has built-in CPD enabled by default; verify it is not explicitly disabled in sonar properties). Other duplication detectors, CI checks, or pre-commit hooks that flag duplicate code also satisfy this criterion.
- code_modularization (Level 4) [Skippable]: Code modularization ā Check for tooling that enforces code modularization and boundaries. Skip for small projects where module boundaries are not meaningful, or Rust codebases (compiler enforces visibility). PASS if ANY ONE of the following exists: 1) JS/TS: eslint-plugin-boundaries, eslint-plugin-import/no-restricted-paths, dependency-cruiser, Nx module boundaries. 2) Java: ArchUnit configured for architecture tests. 3) Go:
internal/ package directories used (compiler-enforced boundaries). 4) Python: import-linter configured. 5) C#: ArchUnitNET. 6) Architectural fitness functions or layer enforcement in CI.
- n_plus_one_detection (Level 4) [Skippable]: N+1 query detection ā Check for N+1 query detection tooling. Look for: 1) bullet gem for Rails. 2) nplusone for Python/Django. 3) DataLoader pattern usage (graphql-dataloader). 4) ORM query logging with analysis. 5) Database query analysis in tests. 6) APM with slow query detection configured. PASS if any N+1 detection mechanism exists. Skip for apps without database/ORM usage (e.g., frontend-only apps, libraries, static sites).
- heavy_dependency_detection (Level 4) [Skippable]: Heavy dependency detection ā Check for bundle size analysis and heavy dependency detection. Look for: 1) Bundle analyzer configured (webpack-bundle-analyzer, @next/bundle-analyzer, rollup-plugin-visualizer). 2) Size limit tools (size-limit, bundlesize, bundlewatch). 3) Import cost IDE extension configuration. 4) CI job that reports bundle size changes. 5) Lighthouse CI for performance budgets. PASS if any bundle/dependency size analysis is configured. Skip for non-bundled applications (e.g., backend services, CLI tools, server-side apps).
- unused_dependencies_detection (Level 3): Unused dependencies detection ā Check for tooling that detects unused dependencies in any language. PASS if ANY ONE of the following exists: 1) JS/TS: depcheck, npm-check, or knip configured. 2) Python: deptry or pip-extra-reqs. 3) Go:
go mod tidy in CI (ensures go.mod only has used deps). 4) Rust: cargo-udeps. 5) Java/Maven: mvn dependency:analyze in CI. 6) Java/Gradle: dependency-analysis plugin. 7) Any CI job or pre-commit hook that checks for unused dependencies.
- unit_tests_exist (Level 1): Unit tests present ā .test.ts / tests/ (TS) or tests/test_.py (Py).
- integration_tests_exist (Level 3): Integration tests present ā cypress/, playwright.config.ts (TS) or tests/integration/, Behave .feature files (Py).
- unit_tests_runnable (Level 2): Tests runnable locally ā "test": "jest" (or Vitest) script in package.json (TS) or pytest runnable via tox/make test (Py). Actually run the command you find to see if the tests really are runnable (do not worry about whether they pass, just if they can be run). Use flags like --listTests (jest) or --collect-only (pytest) to verify runnability without running the full suite, which can take hours. It is very important to use these flags to avoid waiting for the entire test suite to complete.
- test_performance_tracking (Level 4): Test performance tracking ā Test suite duration is measured and tracked. Check: 1) CI outputs that show test timing (e.g., jest --verbose, pytest --durations). 2) Test reports uploaded as artifacts. 3) Integration with test analytics platforms (BuildPulse, Datadog CI, GitHub Actions test reporting). 4) Config flags for test timing output in package.json scripts or CI workflows. Evidence that org monitors test performance, not just pass/fail.
- flaky_test_detection (Level 4) [Skippable]: Flaky test detection ā Check for proactive flaky test management. If
gh or glab CLI is available and authenticated, run gh pr list --state all --limit 10 --json statusCheckRollup to detect duplicate check names (indicates retries/flakiness). Also check for: 1) Test retry configuration (jest-retry, pytest-rerunfailures). 2) Flaky test tracking tools (BuildPulse). 3) CI quarantine/skip mechanisms. 4) Test stability metrics. Skip if gh/glab CLI is not available or not authenticated and no other flaky test detection evidence exists.
- test_coverage_thresholds (Level 2): Test coverage thresholds ā Minimum coverage percentages are enforced. Common approaches: jest.config.js coverageThreshold, pytest --cov-fail-under, Codecov/Coveralls with PR status checks blocking on coverage, SonarQube/SonarCloud quality gate with coverage threshold (sonar.coverage.* settings or sonar.qualitygate.wait=true in CI). Other CI gates or tools that enforce minimum coverage also satisfy this criterion. Agents must know they have to maintain coverage, not just that it is tracked.
- test_naming_conventions (Level 3): Test naming conventions ā Check for consistent test file naming enforcement in any language. PASS if ANY ONE of the following exists: 1) JS/TS: Jest testMatch/testRegex, Vitest include patterns, or Mocha test directory config. 2) Python: pytest naming conventions in pytest.ini or pyproject.toml (test_*.py pattern). 3) Go: *_test.go convention (built-in, check tests exist following pattern). 4) Java: Maven/Gradle test source directories with naming patterns. 5) Any test framework configured with explicit naming patterns. 6) Test naming conventions documented in AGENTS.md or CONTRIBUTING.md.
- test_isolation (Level 4): Test isolation ā Check for test isolation enforcement in any language. PASS if ANY ONE of the following exists: 1) JS/TS: Jest parallelization (not --runInBand), Vitest threads, or test sharding configured. 2) Python: pytest-xdist for parallel execution. 3) Go:
go test -parallel or t.Parallel() usage. 4) Java: JUnit parallel execution config, or Maven/Gradle parallel test forks. 5) Database isolation patterns (transactions, test databases, factories, testcontainers). 6) Test randomization enabled (--randomize, pytest-randomly). 7) Any test framework configured for parallel or isolated execution.
- api_schema_docs (Level 3) [Skippable]: API schema docs ā OpenAPI/Swagger specification or GraphQL schema exists for service APIs. Search recursively for files matching patterns: **/openapi.{json,yaml,yml}, **/swagger.{json,yaml,yml}, **/.openapi.{json,yaml}, **/.swagger.{json,yaml}, **/schema.graphql, **/.graphql, **/.gql. PASS if any valid API schema file is found anywhere in the repository. Skip for non-API apps (e.g., libraries, CLI tools without HTTP APIs).
- database_schema (Level 2) [Skippable]: Database schema ā Schema definition files exist for databases (Prisma schema, TypeORM entities, SQLAlchemy models, raw SQL schemas). Agents need to understand the data model to make correct changes. Skip for apps without databases.
- structured_logging (Level 2): Structured logging ā Check for logging library in dependencies: TS/JS (winston, pino, bunyan, log4js in package.json), Python (structlog, loguru, python-json-logger in requirements/pyproject.toml), or custom logger module (src/logger., lib/logging.). PASS if any logging library is installed OR a dedicated logger module exists.
- distributed_tracing (Level 3): Check for trace ID or request ID propagation through the application (OpenTelemetry, X-Request-ID headers, etc.) that allows following a request through the system.
- metrics_collection (Level 3): Check for metrics/telemetry instrumentation (Datadog, Axiom, Prometheus, New Relic, CloudWatch, etc.) for understanding application performance.
- code_quality_metrics (Level 4) [Skippable]: Code quality metrics tracked ā Coverage, complexity, and maintainability metrics are monitored. If
gh or glab CLI is available and authenticated, first check admin access: GitHub: gh api repos/{owner}/{repo} --jq '.permissions.admin', GitLab: glab api projects/{id} --jq '.permissions.project_access.access_level' (need >= 40). If no admin/maintainer access, skip the code-scanning API check but still check for other approaches. Code scanning check: run gh api /repos/{owner}/{repo}/code-scanning/analyses; 403 "Code Security must be enabled" = FAIL, 200 with array = PASS. Also check: coverage bots in PR comments (run gh pr list --state merged --limit 10 --json comments and search for "coverage", "codecov", "coveralls"), coverage configuration (grep for "--coverage" in package.json test scripts, or check jest.config/vitest.config coverage settings), SonarQube/SonarCloud (provides coverage, maintainability, reliability metrics with quality gates; strong evidence if sonar.qualitygate.wait=true in CI). Other code quality platforms or CI checks that track these metrics also satisfy this criterion. PASS if ANY method found. Skip if no evidence found and gh/glab CLI is not available, not authenticated, or lacks admin/maintainer access.
- error_tracking_contextualized (Level 2): Error tracking contextualized ā Sentry, Bugsnag, or Rollbar is configured with source maps, breadcrumbs, and user context. Agents can trace production errors back to specific code paths with full stack traces.
- alerting_configured (Level 3): Alerting configured ā PagerDuty, OpsGenie, or custom alerting rules are defined. The system actively notifies when things go wrong rather than waiting for someone to notice. Prerequisite for incident response.
- deployment_observability (Level 4): Deployment observability ā Look for pointers to monitoring dashboards (Datadog, Grafana, New Relic links in docs or code comments). Check for deploy notification integrations (Slack webhooks, deployment annotations in monitoring). PASS if documentation references where to check deploy impact, even if dashboards are hosted externally.
- health_checks (Level 3) [Skippable]: Health checks ā Check for health check endpoints and liveness/readiness probes. Look for: 1)
/health, /healthz, /ready, /live endpoints in routes. 2) Kubernetes liveness/readiness probes in deployment manifests. 3) Health check libraries (terminus, lightship for Node.js, django-health-check). 4) Docker HEALTHCHECK instruction. 5) Load balancer health check configuration. PASS if any health check mechanism is implemented. Skip for non-deployed services (e.g., libraries, CLI tools, scripts, batch jobs).
- circuit_breakers (Level 4) [Skippable]: Circuit breakers ā Check for circuit breaker pattern implementation. Look for: 1) Circuit breaker libraries (opossum, cockatiel for Node.js, resilience4j for Java, polly for .NET, tenacity for Python). 2) Service mesh with circuit breaking (Istio, Linkerd configuration). 3) Custom circuit breaker implementation (grep for 'circuit', 'breaker', 'fallback' patterns). 4) Retry with exponential backoff configuration. PASS if circuit breaker or resilience pattern is implemented for external calls. Skip for apps without external service dependencies (e.g., standalone tools, libraries).
- profiling_instrumentation (Level 4) [Skippable]: Profiling instrumentation ā Check for performance profiling infrastructure. Look for: 1) APM tools (Datadog APM, New Relic, Dynatrace) in dependencies or config. 2) Continuous profiling (Pyroscope, Parca, Google Cloud Profiler). 3) Node.js profiling (clinic.js, 0x configured). 4) Memory profiling setup. 5) Flame graph generation capability. PASS if any profiling tooling is configured for production or development use. Skip for apps where performance profiling is not meaningful (e.g., libraries, simple scripts).
- dast_scanning (Level 4) [Skippable]: DAST scanning ā Check for Dynamic Application Security Testing (DAST) in CI/CD. Look for: 1) OWASP ZAP configured in CI workflows (zap-scan action, zap-baseline). 2) Burp Suite Enterprise or Burp CI integration. 3) Nuclei scanner configured. 4) Other DAST tools (Acunetix, Netsparker, StackHawk). 5) Custom security test suites that hit running endpoints. PASS if any DAST tool runs against a staging/test environment in CI. This is distinct from SAST (static analysis) - DAST tests the running application. Skip for apps that are not deployed as web services (e.g., libraries, CLI tools, scripts).
- pii_handling (Level 3) [Skippable]: PII handling ā Check for PII detection and handling tooling. Look for: 1) Data classification tools (Microsoft Presidio, AWS Macie integration, Google DLP). 2) PII detection in CI (detect-secrets with PII patterns, custom regex scanners). 3) Data masking libraries in dependencies (faker for test data, masking utilities). 4) Documentation of PII handling in AGENTS.md, privacy policy references, or data-handling docs. PASS if any PII-aware tooling or documented handling procedures exist. Skip for apps that do not process personal/user data (e.g., internal tools, infrastructure, developer utilities).
- log_scrubbing (Level 3): Log scrubbing ā Check for log sanitization/scrubbing mechanisms. Look for: 1) Logging library with redaction support configured (pino redact, winston format with filtering, structlog processors). 2) Custom log sanitization middleware or utilities (grep for 'redact', 'sanitize', 'mask' in logging code). 3) Log scrubbing documentation in AGENTS.md or logging guidelines. 4) PII filtering patterns in log configuration. PASS if any log sanitization mechanism is configured or documented.
- product_analytics_instrumentation (Level 3): Product analytics instrumentation ā Mixpanel, Amplitude, PostHog, Heap, or GA4 is instrumented in the application. Agents can see whether features are actually used and measure the impact of their changes on user behavior.
- error_to_insight_pipeline (Level 5): Error to insight pipeline ā Check for Sentry-GitHub/GitLab integration: search for sentry.io webhook in .github/workflows or repo settings, OR Sentry issue linking config (SENTRY_ORG, SENTRY_PROJECT in env). Also check for error-to-issue automation: GitHub Actions that create issues from errors, or PagerDuty/OpsGenie integrations with issue creation. PASS if any error tracking tool has issue creation integration configured.
For each criterion, provide:
⢠numerator (integer ℠0 or null):
- Repository scope: 1 if pass, 0 if fail, null if skipped/N/A
- Application scope: Count of applications that pass (0 to N), or null if skipped/N/A
- Null can ONLY be used for criteria marked as [Skippable]
⢠denominator (integer ℠1):
- Repository scope: Always 1
- Application scope: Always N (from Phase 2)
⢠rationale (string, max 500 chars): Brief explanation
Phase 4 - Report Validation
CRITICAL: Before writing the report file, validate your report:
-
Application count consistency:
ā All 38 Application Scope criteria have denominator = N
ā All 44 Repository Scope criteria have denominator = 1
-
Report completeness:
ā Report contains EXACTLY 82 criterion keys
ā You used ONLY these exact IDs: large_file_detection, tech_debt_tracking, build_cmd_doc, deps_pinned, vcs_cli_tools, automated_pr_review, agentic_development, fast_ci_feedback, build_performance_tracking, deployment_frequency, single_command_setup, feature_flag_infrastructure, release_notes_automation, progressive_rollout, rollback_automation, monorepo_tooling, version_drift_detection, release_automation, dead_feature_flag_detection, agents_md, readme, automated_doc_generation, skills, documentation_freshness, service_flow_documented, agents_md_validation, devcontainer, env_template, local_services_setup, devcontainer_runnable, runbooks_documented, branch_protection, secret_scanning, codeowners, automated_security_review, dependency_update_automation, gitignore_comprehensive, privacy_compliance, secrets_management, min_release_age, issue_templates, issue_labeling_system, backlog_health, pr_templates, lint_config, type_check, formatter, pre_commit_hooks, strict_typing, naming_consistency, cyclomatic_complexity, dead_code_detection, duplicate_code_detection, code_modularization, n_plus_one_detection, heavy_dependency_detection, unused_dependencies_detection, unit_tests_exist, integration_tests_exist, unit_tests_runnable, test_performance_tracking, flaky_test_detection, test_coverage_thresholds, test_naming_conventions, test_isolation, api_schema_docs, database_schema, structured_logging, distributed_tracing, metrics_collection, code_quality_metrics, error_tracking_contextualized, alerting_configured, deployment_observability, health_checks, circuit_breakers, profiling_instrumentation, dast_scanning, pii_handling, log_scrubbing, product_analytics_instrumentation, error_to_insight_pipeline
ā No invented/extra criterion names
If ANY validation check fails, STOP and revise before proceeding.
Phase 5 - Scoring & Docs Report Generation
-
Calculate the score
⢠Signals with null numerator (skipped / N/A) are excluded from scoring
⢠The repository's readiness level is determined by overall pass rate:
- Pass rate formula: ((numerator_1/denominator_1) + (numerator_2/denominator_2) + ... + (numerator_n/denominator_n)) / n
where n = number of non-skipped signals (signals with null numerator are excluded)
- Each signal contributes equally regardless of its denominator
- Example: signal A = 3/5 (0.6), signal B = 1/1 (1.0), signal C = 0/2 (0.0)
Pass rate = (0.6 + 1.0 + 0.0) / 3 = 53.3%
- Level 1: 0-20% pass rate
- Level 2: 20-40% pass rate
- Level 3: 40-60% pass rate
- Level 4: 60-80% pass rate
- Level 5: 80-100% pass rate
⢠All signals are weighted equally regardless of which level category they belong to
-
Create the readiness report file
⢠Ensure docs/ exists
⢠Start from the bundled report-template.md structure, then fill every placeholder with audited repository evidence
⢠Write the final report to docs/readiness-report.md
⢠Include repository identifier, evaluation date, discovered applications, overall score, readiness level, all 82 criteria, and action items
⢠Do not leave {{placeholder}} tokens or example values in the final report
⢠Do not call external storage, database, or report persistence tools
⢠If docs/readiness-report.md already exists, overwrite it when the user requested a fresh, generated, or updated readiness report; otherwise ask before replacing
⢠For each criterion, provide: numerator (int or null for skipped), denominator (int >= 1), rationale (string)
⢠Include application metadata for monorepos: provide a map of app paths to description objects
-
Provide a concise summary to the user
⢠After writing the report file, present a structured report in this EXACT format:
# Level
<Output the achieved level: Level 1, Level 2, Level 3, Level 4, or Level 5>
# Applications
<List all applications discovered with their descriptions>
Example:
1. apps/backend - Main Next.js application for user interface
2. apps/cli - CLI tool for local development
# Criteria
<For each criterion evaluated, show: criterion name -> score (numerator/denominator) with brief rationale>
Format as:
**Category Name**
- Criterion Name: X/Y - Rationale for the score (especially if failing)
- Another Criterion: X/Y - Rationale
Organize by category (Style & Validation, Build System, Testing, Documentation, Dev Environment, Debugging & Observability, Security)
# Action Items
<List 2-3 high-impact next steps to reach the next level>
Example:
- Add pre-commit hooks to enforce linting and formatting
- Document build commands in README or AGENTS.md
- Set up branch protection rules on main branch
⢠Focus on being concise yet informative
⢠For criteria, highlight rationale especially for failing checks (0 score)
⢠Action items should be specific and achievable
⢠Include the path to the written report: docs/readiness-report.md
Behavioral Guidelines
⢠Be deterministic: identical repo ā identical output
⢠Prefer existence checks over deep semantic analysis
⢠Assume default branch is the evaluation target
⢠If evidence is ambiguous, fail the item
⢠Keep notes terse, actionable, and under 500 characters
⢠After writing docs/readiness-report.md, provide a concise human-readable summary
⢠Application count from Phase 2 is fixed for the entire evaluation
⢠Repository Scope denominators are ALWAYS 1
⢠Application Scope denominators are ALWAYS N (from Phase 2)
⢠Use ONLY the 82 defined criterion IDs
⢠Do not call external report storage tools
⢠Do not run destructive commands
⢠Do not install dependencies
⢠Only create or update docs/readiness-report.md; do not modify source code or configuration files