تشغيل أي مهارة في Manus بنقرة واحدة

deploy-dev

Use when deploying code changes to dev environments (dev1-4), running terraform apply against dev, or verifying changes end-to-end. Triggers on "deploy to dev", "apply to dev2", "test in dev", "update dev environment".

تشغيل في Manus

نظرة عامة

أمر التثبيت

npx skills add https://github.com/METR/inspect-action --skill deploy-dev

انسخ والصق هذا الأمر في Claude Code لتثبيت المهارة

المصدر

METR/inspect-action

النجوم٢٤

التفرعات١١

آخر تحديث١٥ فبراير ٢٠٢٦ في ١٨:١٨

SKILL.md

readonly

name	deploy-dev
description	Use when deploying code changes to dev environments (dev1-4), running terraform apply against dev, or verifying changes end-to-end. Triggers on "deploy to dev", "apply to dev2", "test in dev", "update dev environment".

Dev Environment Architecture

Dev environments (dev1-4) run in the staging AWS account.

Shared resources (when create_* flags are false in tfvars):

S3 bucket: staging-metr-inspect-data
EventBridge bus: staging-inspect-ai-api
EKS cluster: staging-eks-cluster
ALB, VPC, subnets

Dedicated per environment:

Warehouse DB: {env_name}-inspect-ai-warehouse
API service: {env_name}-inspect-ai-api
Lambda functions: {env_name}-inspect-ai-*
ECR repos: {env_name}/inspect-ai/*
CloudWatch logs

Config lives in terraform/terraform.tfvars (gitignored — set to your target environment before deploying).

Note: Commands below may require AWS credentials for the staging account. Set AWS_PROFILE or configure credentials as appropriate for your environment.

Targeted Deploy

When you only changed one component:

tofu -chdir=terraform apply \
  -var-file="terraform.tfvars" -target=module.<target> -auto-approve

Common targets:

Target	What It Deploys
`module.eval_log_importer`	Batch job for importing eval logs to warehouse
`module.job_status_updated`	Lambda processing S3 events
`module.api`	API server (ECS Fargate)
`module.scan_importer`	Lambda for scan imports
`module.eval_log_reader`	Lambda for S3 Object Lambda access
`module.warehouse`	PostgreSQL Aurora database

Full Deploy

# Preview changes
tofu -chdir=terraform plan -var-file="terraform.tfvars"

# Apply all changes
tofu -chdir=terraform apply -var-file="terraform.tfvars"

Runner Image Rebuilds

Terraform automatically tracks uv.lock, pyproject.toml, and Python source files to determine when the runner Docker image needs rebuilding (see terraform/modules/runner/ecr.tf). tofu apply is the correct way to rebuild runner images — it produces deterministic sha256.<hash> image tags.

Do NOT use scripts/dev/build-and-push-runner-image.sh for deploying to dev environments. That script is only for local hawk local testing.

Docker Registry Authentication

Before tofu apply can build and push images, you must be logged into:

ECR: Install the docker-credential-ecr-login helper and configure Docker to use it
DHI (hardened images) registry: docker login dhi.io

Lambda Lock Files

After changing Python dependencies (pyproject.toml or uv.lock), run ./scripts/dev/uv-lock-all.sh to update lock files in ALL terraform modules. Terraform modules have their own lock files that must stay in sync with the root.

Database Migrations

See the db-migrations skill for full details.

Quick command:

DATABASE_URL=$(tofu -chdir=terraform output \
  -var-file="terraform.tfvars" -raw warehouse_database_url_admin) \
  alembic upgrade head

Verify with Smoke Tests

See the smoke-tests skill for full details.

Quick command:

unset HAWK_MODEL_ACCESS_TOKEN_ISSUER
scripts/dev/create-smoke-test-env.py env/smoke-dev2 --terraform-dir terraform
set -a && source env/smoke-dev2 && set +a && \
  pytest tests/smoke -m smoke --smoke -vv -n 5

Complete Workflow

The typical deploy-and-verify loop:

Make code changes
Run local tests: pytest tests/{package} -n auto -vv
Run quality checks: ruff check . && ruff format . --check && basedpyright .
Deploy to dev: tofu apply -target=module.X
Run migrations if needed (see Database Migrations section above)
Run smoke tests to verify
Commit and push

Choosing Between Shared and Dedicated Infrastructure

Dev environments default to reusing staging infra. To create dedicated resources, change these flags in your tfvars:

# Shared (default for dev environments)
create_s3_bucket       = false
create_eventbridge_bus = false
create_eks_resources   = false

# Dedicated (set to true + provide unique names)
create_s3_bucket       = true
s3_bucket_name         = "dev5-metr-inspect-data"
create_eventbridge_bus = true
eventbridge_bus_name   = "dev5-inspect-ai-api"
create_eks_resources   = true

Use shared (recommended): Cheaper, simpler, sufficient for most dev work.

Use dedicated: When testing infrastructure changes, or when you need full isolation from other devs.

Shared Mode Implications (create_eks_resources=false)

When using shared infrastructure, dev environments reuse staging's k8s namespace, ClusterRole, and ClusterRoleBinding. This means:

Runner pods use the staging k8s service account and RBAC bindings
Changes to k8s auth patterns (e.g., in-cluster config, service account permissions) must be tested in shared mode — they may work on staging (which has its own dedicated ClusterRoleBinding) but fail on dev environments that rely on staging's bindings
The Helm chart creates a RoleBinding in the sandbox namespace only — ClusterRoleBinding for the runner namespace comes from terraform (only when create_eks_resources=true)

Useful Log Groups

Component	Log Group
API server	`{env_name}/inspect-ai/api`
Batch importer	`/{env_name}/inspect-ai/eval-log-importer/batch`
Job status Lambda	`/aws/lambda/{env_name}-inspect-ai-job-status-updated`
Runner pods	Use `hawk logs <eval-set-id>`

Tail logs: aws logs tail <log-group> --since 30m --format short

Troubleshooting

Issue	Cause	Fix
`tofu apply` fails with dependency cycle	Deposed resources from previous module refactors	`tofu state rm` on the orphaned resource, then re-apply
`exec format error` during Docker build	Building wrong architecture (e.g., arm64 instead of amd64)	Terraform handles this correctly; if building manually, specify `--platform linux/amd64`
Lambda can't find Docker image after apply	Docker build failed silently	Check ECR for the expected image tag; verify registry auth
Connection refused / timeout to dev resources	VPC routing issue — may be running in a different VPC	Check Tailscale is connected; if in a different AWS VPC, need user assistance for networking
`tofu apply` succeeds but runner image is stale	Using manual build script instead of terraform	Use `tofu apply` — it tracks `uv.lock` and rebuilds automatically

المزيد من هذا المستودع

نفس المستودع

debug-stuck-eval

METR/inspect-action

Debug stuck Hawk/Inspect AI evaluations. Use when user mentions "stuck eval", "eval not progressing", "eval hanging", "samples not completing", "eval set frozen", "runner stuck", "500 errors in eval", "retry loop", "eval timeout", or asks why an evaluation isn't finishing.

2026-03-0324

database-migrations

METR/inspect-action

Use when creating alembic migrations, applying migrations to remote environments, or recovering from schema drift. Triggers on changes to models.py, "run migration", "schema drift", "alembic", "database error in batch jobs".

2026-02-1524

fullstack-dev

METR/inspect-action

Use when developing the frontend and backend together, making UI changes, or setting up local dev with linked inspect_ai/scout libraries. Triggers on frontend changes, "yarn dev", "vite", "www/", or React component work.

2026-02-1524

smoke-tests

METR/inspect-action

Use when running smoke tests, debugging smoke test failures, or verifying a deployed environment works correctly. Triggers on "run smoke tests", "smoke tests failing", "test against dev", "verify deployment".

2026-02-1524

monitoring

METR/inspect-action

Monitor Hawk job status, view logs, and diagnose issues. Use when the user wants to check job progress, view error logs, debug a failing job, or generate a monitoring report for a Hawk evaluation run.

2026-01-1824

view-results

METR/inspect-action

View and analyze Hawk evaluation results. Use when the user wants to see eval-set results, check evaluation status, list samples, view transcripts, or analyze agent behavior from a completed evaluation run.

2026-01-1824

المصدر

METR

METR/inspect-action

فتح مستودع GitHub عرض مستودعات المنشئ

أمر التثبيت

تنزيل

تشغيل في Manus

مفيد لـSOC

مديرو الشبكات وأنظمة الحاسوبمهن الحاسوب والرياضيات15-1244L4

name	deploy-dev
description	Use when deploying code changes to dev environments (dev1-4), running terraform apply against dev, or verifying changes end-to-end. Triggers on "deploy to dev", "apply to dev2", "test in dev", "update dev environment".

Dev Environment Architecture

Dev environments (dev1-4) run in the staging AWS account.

Shared resources (when create_* flags are false in tfvars):

S3 bucket: staging-metr-inspect-data
EventBridge bus: staging-inspect-ai-api
EKS cluster: staging-eks-cluster
ALB, VPC, subnets

Dedicated per environment:

Warehouse DB: {env_name}-inspect-ai-warehouse
API service: {env_name}-inspect-ai-api
Lambda functions: {env_name}-inspect-ai-*
ECR repos: {env_name}/inspect-ai/*
CloudWatch logs

Config lives in terraform/terraform.tfvars (gitignored — set to your target environment before deploying).

Note: Commands below may require AWS credentials for the staging account. Set AWS_PROFILE or configure credentials as appropriate for your environment.

Targeted Deploy

When you only changed one component:

tofu -chdir=terraform apply \
  -var-file="terraform.tfvars" -target=module.<target> -auto-approve

Common targets:

Target	What It Deploys
`module.eval_log_importer`	Batch job for importing eval logs to warehouse
`module.job_status_updated`	Lambda processing S3 events
`module.api`	API server (ECS Fargate)
`module.scan_importer`	Lambda for scan imports
`module.eval_log_reader`	Lambda for S3 Object Lambda access
`module.warehouse`	PostgreSQL Aurora database

Full Deploy

# Preview changes
tofu -chdir=terraform plan -var-file="terraform.tfvars"

# Apply all changes
tofu -chdir=terraform apply -var-file="terraform.tfvars"

Runner Image Rebuilds

Do NOT use scripts/dev/build-and-push-runner-image.sh for deploying to dev environments. That script is only for local hawk local testing.

Docker Registry Authentication

Before tofu apply can build and push images, you must be logged into:

ECR: Install the docker-credential-ecr-login helper and configure Docker to use it
DHI (hardened images) registry: docker login dhi.io

Lambda Lock Files

Database Migrations

See the db-migrations skill for full details.

Quick command:

DATABASE_URL=$(tofu -chdir=terraform output \
  -var-file="terraform.tfvars" -raw warehouse_database_url_admin) \
  alembic upgrade head

Verify with Smoke Tests

See the smoke-tests skill for full details.

Quick command:

unset HAWK_MODEL_ACCESS_TOKEN_ISSUER
scripts/dev/create-smoke-test-env.py env/smoke-dev2 --terraform-dir terraform
set -a && source env/smoke-dev2 && set +a && \
  pytest tests/smoke -m smoke --smoke -vv -n 5

Complete Workflow

The typical deploy-and-verify loop:

Make code changes
Run local tests: pytest tests/{package} -n auto -vv
Run quality checks: ruff check . && ruff format . --check && basedpyright .
Deploy to dev: tofu apply -target=module.X
Run migrations if needed (see Database Migrations section above)
Run smoke tests to verify
Commit and push

Choosing Between Shared and Dedicated Infrastructure

Dev environments default to reusing staging infra. To create dedicated resources, change these flags in your tfvars:

# Shared (default for dev environments)
create_s3_bucket       = false
create_eventbridge_bus = false
create_eks_resources   = false

# Dedicated (set to true + provide unique names)
create_s3_bucket       = true
s3_bucket_name         = "dev5-metr-inspect-data"
create_eventbridge_bus = true
eventbridge_bus_name   = "dev5-inspect-ai-api"
create_eks_resources   = true

Use shared (recommended): Cheaper, simpler, sufficient for most dev work.

Use dedicated: When testing infrastructure changes, or when you need full isolation from other devs.

Shared Mode Implications (create_eks_resources=false)

When using shared infrastructure, dev environments reuse staging's k8s namespace, ClusterRole, and ClusterRoleBinding. This means:

Runner pods use the staging k8s service account and RBAC bindings
Changes to k8s auth patterns (e.g., in-cluster config, service account permissions) must be tested in shared mode — they may work on staging (which has its own dedicated ClusterRoleBinding) but fail on dev environments that rely on staging's bindings
The Helm chart creates a RoleBinding in the sandbox namespace only — ClusterRoleBinding for the runner namespace comes from terraform (only when create_eks_resources=true)

Useful Log Groups

Component	Log Group
API server	`{env_name}/inspect-ai/api`
Batch importer	`/{env_name}/inspect-ai/eval-log-importer/batch`
Job status Lambda	`/aws/lambda/{env_name}-inspect-ai-job-status-updated`
Runner pods	Use `hawk logs <eval-set-id>`

Tail logs: aws logs tail <log-group> --since 30m --format short

Troubleshooting

Issue	Cause	Fix
`tofu apply` fails with dependency cycle	Deposed resources from previous module refactors	`tofu state rm` on the orphaned resource, then re-apply
`exec format error` during Docker build	Building wrong architecture (e.g., arm64 instead of amd64)	Terraform handles this correctly; if building manually, specify `--platform linux/amd64`
Lambda can't find Docker image after apply	Docker build failed silently	Check ECR for the expected image tag; verify registry auth
Connection refused / timeout to dev resources	VPC routing issue — may be running in a different VPC	Check Tailscale is connected; if in a different AWS VPC, need user assistance for networking
`tofu apply` succeeds but runner image is stale	Using manual build script instead of terraform	Use `tofu apply` — it tracks `uv.lock` and rebuilds automatically