Skip to main content
Manusで任意のスキルを実行
ワンクリックで
GitHub リポジトリ

inspect_evals

inspect_evals には UKGovernmentBEIS から収集した 17 個の skills があり、リポジトリ単位の職業カバレッジとサイト内 skill 詳細ページを表示します。

収集済み skills
17
Stars
551
更新
2026-06-19
Forks
358
職業カバレッジ
6 件の職業カテゴリ · 100% 分類済み
リポジトリエクスプローラー

このリポジトリの skills

ci-maintenance-workflow
ソフトウェア品質保証アナリスト・テスター

CI and GitHub Actions maintenance workflows — fix a failing test from a CI URL, fix a failing smoke test, add @pytest.mark.slow markers to slow tests, or review a PR against agent-checkable standards. Use when user asks to fix a failing test, fix a smoke test, mark slow tests, or review a PR. Trigger when the user asks you to run the "Write a PR For A Failing Test", "Fix A Failing Smoke Test", "Mark Slow Tests", or "Review PR According to Agent-Checkable Standards" workflow.

2026-06-19
prepare-submission-workflow
ソフトウェア開発者

Prepare an evaluation for PR submission as an entry to the register. Use when user asks to prepare an eval for submission or finalize a PR. Trigger when the user asks you to run the "Prepare Evaluation For Submission" workflow.

2026-06-11
eval-validity-review
ソフトウェア品質保証アナリスト・テスター

Review a single evaluation's validity — whether its claims hold up, whether its name is accurate, whether samples can be both succeeded and failed at, and whether scoring measures ground truth. Use when user asks to check validity of an eval, or as part of the Master Checklist workflow. Do NOT use for code quality or test coverage (use eval-quality-workflow or ensure-test-coverage instead).

2026-06-07
code-quality-fix-all
ソフトウェア開発者

Fix code quality issues identified in a code quality review stored in agent_artefacts/code_quality/<topic>/. Systematically addresses issues found by the code-quality-review-all skill for ANY code quality topic, with validation and testing at each step. Use when user asks to fix issues from a code quality review, or asks to fix issues from agent_artefacts/code_quality/<topic>.

2026-06-04
eval-report-workflow
ソフトウェア品質保証アナリスト・テスター

Create an evaluation report for a README by selecting models, estimating costs, running evaluations, and formatting results tables. Use when user asks to make/create/generate an evaluation report. Trigger when the user asks you to run the "Make An Evaluation Report" workflow.

2026-05-24
create-eval
ソフトウェア開発者

Redirect to the inspect-evals-template for creating new evaluations. New evals are no longer created in this repository — they live in standalone repos. Use when user asks to create/implement/build a new evaluation.

2026-05-04
ensure-test-coverage
ソフトウェア品質保証アナリスト・テスター

Ensure test coverage for a single evaluation - both reviewing existing tests and creating missing ones. Analyzes testable components, checks tests against repository conventions, reports coverage gaps, and creates or improves tests. Use when user asks to check/review/create/add/ensure tests for an eval. Use whenever you are asked to review an evaluation that contains tests, or whenever you need to write a suite of tests. Do NOT use for fixing a specific failing CI test (use ci-maintenance-workflow instead).

2026-04-30
security-audit-eval
情報セキュリティアナリスト

Audit a third-party Inspect AI evaluation for security risks before running it locally. Decide whether the eval is safe by checking for malicious host-side code, externally-fetched files that aren't quality-controlled, sandbox-breakout instructions, weak sandbox configuration, supply-chain hazards, credential exposure, resource exhaustion, and provenance signals. Use when the user asks to audit / vet / security-review an eval repo (GitHub URL or local path), or asks "is it safe to run X". Do NOT use for assessing whether an eval *measures what it claims* (use eval-validity-review) or for general code-quality review (use eval-quality-workflow / code-quality-review-all).

2026-04-30
build-repo-context
ソフトウェア開発者コンピュータシステムアナリスト

Crawl repository PRs, issues, and review comments to distill institutional knowledge into a shared knowledge base. Run periodically by "context agents" to maintain agent_artefacts/repo_context/REPO_CONTEXT.md. Trigger only on specific request.

2026-04-10
generate-asset-actions
コンピュータネットワークアーキテクト

Generate asset-actions.yaml from ASSETS.yaml by classifying assets into priority tiers. Use when the user asks to regenerate, update, or refresh the asset actions.

2026-03-25
code-quality-review-all
ソフトウェア品質保証アナリスト・テスター

Review all evaluations in the repository against a single code quality standard. Checks ALL evals against ONE standard for periodic quality reviews. Use when user asks to review/audit/check all evaluations for a specific topic or standard. Do NOT use for reviewing a single eval (use eval-quality-workflow instead) or for test coverage (use ensure-test-coverage instead).

2026-03-17
eval-quality-workflow
ソフトウェア品質保証アナリスト・テスター

Fix or review a single evaluation against all EVALUATION_CHECKLIST.md standards. Use "fix" mode to refactor an eval into compliance, or "review" mode to assess compliance without making changes. Use when user asks to fix, review, or check an evaluation's quality. Trigger when the user asks you to run the "Fix An Evaluation" or "Review An Evaluation" workflow. Do NOT use for reviewing ALL evals against a single code quality standard (use code-quality-review-all instead).

2026-03-17
investigate-dataset
データサイエンティスト

Investigate datasets from HuggingFace, CSV, or JSON files to understand their structure, fields, and data quality. Trigger whenever you need to explore or inspect a dataset yourself without using pre-written scripts.

2026-03-17
prepare-release
ソフトウェア開発者

Prepare a new release of inspect_evals by creating a release branch, collecting changelog fragments, and opening a PR. Use when user asks to cut/prepare/create a new release or version bump.

2026-03-17
write-an-adr
コンピュータシステムアナリスト

Write an Architectural Decision Record (ADR) to document a significant design choice. Use when user asks to write/create/document an ADR, or to record an architectural decision.

2026-03-17
check-trajectories-workflow
ソフトウェア品質保証アナリスト・テスター

Use Inspect Scout to analyze agent trajectories from evaluation log files. Runs default and custom scanners to detect external failures, formatting issues, reward hacking, and ethical refusals. Use when user asks to check/analyze agent trajectories. Trigger when the user asks you to run the "Check Agent Trajectories" workflow.

2026-03-06
read-eval-logs
ソフトウェア品質保証アナリスト・テスター

View and analyse Inspect evaluation log files using the Python API. Trigger whenever you need to look at a .eval file yourself without using pre-written scripts.

2026-03-06