تشغيل أي مهارة في Manus بنقرة واحدة

view-results

View and analyze Hawk evaluation results. Use when the user wants to see eval-set results, check evaluation status, list samples, view transcripts, or analyze agent behavior from a completed evaluation run.

تشغيل في Manus

نظرة عامة

أمر التثبيت

npx skills add https://github.com/METR/inspect-action --skill view-results

انسخ والصق هذا الأمر في Claude Code لتثبيت المهارة

المصدر

METR/inspect-action

النجوم٢٤

التفرعات١١

آخر تحديث١٨ يناير ٢٠٢٦ في ٢٠:٥١

SKILL.md

readonly

name	view-results
description	View and analyze Hawk evaluation results. Use when the user wants to see eval-set results, check evaluation status, list samples, view transcripts, or analyze agent behavior from a completed evaluation run.

View Hawk Eval Results

When the user wants to analyze evaluation results, use these hawk CLI commands:

1. List Eval Sets

You can list all eval sets if the user do not know the eval set ID:

hawk list eval-sets

Shows: eval set ID, creation date, creator.

You can increase the limit of results returned by --limit N.

hawk list eval-sets --limit 50

Or you can search for a specific eval set by using --search QUERY.

hawk list eval-sets --search pico

2. List Evaluations

With an eval set ID, you can list all evaluations in the eval-set:

hawk list evals [EVAL_SET_ID]

Shows: task name, model, status (success/error/cancelled), and sample counts.

3. List Samples

Or you can list individual samples and their scores:

hawk list samples [EVAL_SET_ID] [--eval FILE] [--limit N]

4. Download Transcript

To get the full conversation for a specific sample:

hawk transcript <UUID>

The transcript includes full conversation with tool calls, scores, and metadata.

To get even more details, you can get the raw data by using --raw:

hawk transcript <UUID> --raw

Batch Transcript Download

You can also download all transcripts for an entire eval set:

# Fetch all samples in an eval set
hawk transcripts <EVAL_SET_ID>

# Write to individual files in a directory
hawk transcripts <EVAL_SET_ID> --output-dir ./transcripts

# Limit number of samples
hawk transcripts <EVAL_SET_ID> --limit 10

# Raw JSON output (one JSON per line to stdout, or .json files with --output-dir)
hawk transcripts <EVAL_SET_ID> --raw

Workflow

Run hawk list eval-sets to see available eval sets 2a. Run hawk list evals <EVAL_SET_ID> to see available evaluations 2b. or run hawk list samples <EVAL_SET_ID> to find samples of interest 3a. Run hawk transcript <uuid> to get full details on a single sample 3b. or run hawk transcripts <eval_set_id> --output-dir ./transcripts to download all
Read and analyze the transcript(s) to understand the agent's behavior

API Environments

Production (https://api.inspect-ai.internal.metr.org) is used by default. Set HAWK_API_URL only when targeting non-production environments:

Environment	URL
Staging	`https://api.inspect-ai.staging.metr-dev.org`
Dev1	`https://api.inspect-ai.dev1.staging.metr-dev.org`
Dev2	`https://api.inspect-ai.dev2.staging.metr-dev.org`
Dev3	`https://api.inspect-ai.dev3.staging.metr-dev.org`
Dev4	`https://api.inspect-ai.dev4.staging.metr-dev.org`

Example:

HAWK_API_URL=https://api.inspect-ai.staging.metr-dev.org hawk list eval_sets

المزيد من هذا المستودع

نفس المستودع

debug-stuck-eval

METR/inspect-action

Debug stuck Hawk/Inspect AI evaluations. Use when user mentions "stuck eval", "eval not progressing", "eval hanging", "samples not completing", "eval set frozen", "runner stuck", "500 errors in eval", "retry loop", "eval timeout", or asks why an evaluation isn't finishing.

2026-03-0324

database-migrations

METR/inspect-action

Use when creating alembic migrations, applying migrations to remote environments, or recovering from schema drift. Triggers on changes to models.py, "run migration", "schema drift", "alembic", "database error in batch jobs".

2026-02-1524

deploy-dev

METR/inspect-action

Use when deploying code changes to dev environments (dev1-4), running terraform apply against dev, or verifying changes end-to-end. Triggers on "deploy to dev", "apply to dev2", "test in dev", "update dev environment".

2026-02-1524

fullstack-dev

METR/inspect-action

Use when developing the frontend and backend together, making UI changes, or setting up local dev with linked inspect_ai/scout libraries. Triggers on frontend changes, "yarn dev", "vite", "www/", or React component work.

2026-02-1524

smoke-tests

METR/inspect-action

Use when running smoke tests, debugging smoke test failures, or verifying a deployed environment works correctly. Triggers on "run smoke tests", "smoke tests failing", "test against dev", "verify deployment".

2026-02-1524

monitoring

METR/inspect-action

Monitor Hawk job status, view logs, and diagnose issues. Use when the user wants to check job progress, view error logs, debug a failing job, or generate a monitoring report for a Hawk evaluation run.

2026-01-1824

المصدر

METR

METR/inspect-action

فتح مستودع GitHub عرض مستودعات المنشئ

أمر التثبيت

تنزيل

تشغيل في Manus

مفيد لـSOC

مطوّرو البرمجياتمهن الحاسوب والرياضيات15-1252L4

name	view-results
description	View and analyze Hawk evaluation results. Use when the user wants to see eval-set results, check evaluation status, list samples, view transcripts, or analyze agent behavior from a completed evaluation run.

View Hawk Eval Results

When the user wants to analyze evaluation results, use these hawk CLI commands:

1. List Eval Sets

You can list all eval sets if the user do not know the eval set ID:

hawk list eval-sets

Shows: eval set ID, creation date, creator.

You can increase the limit of results returned by --limit N.

hawk list eval-sets --limit 50

Or you can search for a specific eval set by using --search QUERY.

hawk list eval-sets --search pico

2. List Evaluations

With an eval set ID, you can list all evaluations in the eval-set:

hawk list evals [EVAL_SET_ID]

Shows: task name, model, status (success/error/cancelled), and sample counts.

3. List Samples

Or you can list individual samples and their scores:

hawk list samples [EVAL_SET_ID] [--eval FILE] [--limit N]

4. Download Transcript

To get the full conversation for a specific sample:

hawk transcript <UUID>

The transcript includes full conversation with tool calls, scores, and metadata.

To get even more details, you can get the raw data by using --raw:

hawk transcript <UUID> --raw

Batch Transcript Download

You can also download all transcripts for an entire eval set:

# Fetch all samples in an eval set
hawk transcripts <EVAL_SET_ID>

# Write to individual files in a directory
hawk transcripts <EVAL_SET_ID> --output-dir ./transcripts

# Limit number of samples
hawk transcripts <EVAL_SET_ID> --limit 10

# Raw JSON output (one JSON per line to stdout, or .json files with --output-dir)
hawk transcripts <EVAL_SET_ID> --raw

Workflow

Run hawk list eval-sets to see available eval sets 2a. Run hawk list evals <EVAL_SET_ID> to see available evaluations 2b. or run hawk list samples <EVAL_SET_ID> to find samples of interest 3a. Run hawk transcript <uuid> to get full details on a single sample 3b. or run hawk transcripts <eval_set_id> --output-dir ./transcripts to download all
Read and analyze the transcript(s) to understand the agent's behavior

API Environments

Production (https://api.inspect-ai.internal.metr.org) is used by default. Set HAWK_API_URL only when targeting non-production environments:

Environment	URL
Staging	`https://api.inspect-ai.staging.metr-dev.org`
Dev1	`https://api.inspect-ai.dev1.staging.metr-dev.org`
Dev2	`https://api.inspect-ai.dev2.staging.metr-dev.org`
Dev3	`https://api.inspect-ai.dev3.staging.metr-dev.org`
Dev4	`https://api.inspect-ai.dev4.staging.metr-dev.org`

Example:

HAWK_API_URL=https://api.inspect-ai.staging.metr-dev.org hawk list eval_sets