Run any Skill in Manus with one click

perform-agent-builder-eval

Orchestrate agent-builder evaluation runs — init ES/Kibana/EDOT stack, collect eval parameters, output the run command, and stop services.

Run Skill in Manus

Overview

Orchestrate agent-builder evaluation runs — init ES/Kibana/EDOT stack, collect eval parameters, output the run command, and stop services.

Install command

npx skills add https://github.com/elastic/kibana --skill perform-agent-builder-eval

Copy and paste this command into Claude Code to install the skill

Source

elastic/kibana

Stars21,125

Forks8,587

UpdatedMarch 4, 2026 at 20:33

SKILL.md

readonly

More from this repository

same repository

kibana-otel-instrumentation

elastic/kibana

Implement and quality-check OpenTelemetry metric instrumentation in Kibana code that uses `@kbn/metrics`. Use whenever the user wants to add, change, or review OTel metrics — including any call to `metrics.getMeter`, `meter.createCounter`/`createUpDownCounter`/`createGauge`/`createHistogram`/`createObservable*`/`addBatchObservableCallback`, edits to `kibana.yml` `telemetry.metrics` config, or questions like "is this metric well-designed?", "what should I name this counter?", or "which instrument type is right here?". Trigger this skill even when the user does not say "OTel" or "OpenTelemetry" but is clearly adding observability to Kibana server code and already knows what they want to measure.

2026-06-0321.1k

elasticsearch-onboarding

elastic/kibana

Primary guided playbook for Elasticsearch search in Kibana Agent Builder: intent → data → mapping → Dev Tools API snippets (SENSE), with one question at a time. Load this skill whenever the user wants to learn Elasticsearch search, get started, begin building, take first steps, onboard, follow a walkthrough or tutorial, go from zero to a working query, or get structured help setting up indices and search — including casual openers like hi, help, getting started, new to Elasticsearch, how do I build search, or I want to try search. Use when they need end-to-end onboarding, not a single narrow API answer. If they only ask what they can build with Elastic (exploration without the full playbook), prefer invoking /use-case-library first; you can still load this skill afterward for the guided build.

2026-06-0221.1k

elasticsearch-tutorial

elastic/kibana

Topic-driven, hands-on Elasticsearch tutorial flow that runs in Kibana Dev Console. Use whenever the user says "walk me through", "give me a tutorial for", "teach me", "show me how X works", "tutorial on", or similar topical learning intent — and they are NOT asking you to build their real, specific use case. Topics are open-ended: any Elasticsearch / Kibana search concept the user names (e.g. mappings, analyzers, bool queries, semantic_text, kNN, RRF, aggregations, ingest pipelines, reranking, data streams, ES|QL). Tutorials use sample data on isolated resources, present every step as a SENSE snippet to run in Dev Tools, and end with cleanup plus pointers to docs and the onboarding / pattern skills.

2026-06-0221.1k

kbn-github

elastic/kibana

GitHub interactions via gh CLI for the Kibana repo. Use when performing any GitHub interaction — creating, viewing, or modifying PRs or issues, posting comments or reviews, checking CI status, applying labels, creating releases, or making any gh/API call.

2026-06-0221.1k

workflows-custom-steps

elastic/kibana

Register and implement custom workflow steps from an external Kibana plugin using `@kbn/workflows-extensions`. Use when adding or modifying a step type with `registerStepDefinition`, designing input/output/config Zod schemas, implementing `createServerStepDefinition` / `createPublicStepDefinition`, choosing `StepCategory`, building `editorHandlers` (selection / dynamicSchema), wiring `callKibanaApi` / `onCancel`, deciding sync vs async loader registration, updating `APPROVED_STEP_DEFINITIONS`, or reviewing PRs that touch any of these.

2026-06-0121.1k

flaky-test-investigator

elastic/kibana

Investigate Scout and FTR flaky test failures in Kibana. Use when triaging a failed-test issue, a Buildkite-reported failure, a test path that has been failing intermittently, or any time the user asks to look at a flaky test, deflake a test, or stabilize a test.

2026-06-0121.1k

Source

elastic

elastic/kibana

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	perform-agent-builder-eval
description	Orchestrate agent-builder evaluation runs — init ES/Kibana/EDOT stack, collect eval parameters, output the run command, and stop services.
allowed-tools	Bash, Read
argument-hint	["init\|stop"]

Perform Agent Builder Evaluation

This skill manages the lifecycle of running agent-builder evaluations. It accepts $ARGUMENTS as one of: init or stop.

init — Launch ES, Kibana, and EDOT; collect eval parameters; output the run command
stop — Kill background ES, Kibana, and EDOT processes

Action: `init`

Follow these steps sequentially. Each step requires confirmation before proceeding.

Step 1: Prompt for GCS Credentials

Use AskUserQuestion to ask the user for the path to their GCS credentials file. The default path is exactly ~/.gcs/gcs.client.default.credentials_file.json — do NOT suggest any other path (not ~/.config/gcloud/..., not application_default_credentials, etc.).

What is the path to your GCS credentials file? (default: ~/.gcs/gcs.client.default.credentials_file.json)

If the user accepts the default or leaves it blank, use $HOME/.gcs/gcs.client.default.credentials_file.json (expand ~ to the user's home directory). Validate that the resolved path starts with /. If it does not, ask again.

Step 2: Launch Elasticsearch

Launch Elasticsearch in the background using run_in_background. Include the GCS credentials:

yarn es snapshot --license trial --secure-files gcs.client.default.credentials_file=<GCS_CREDENTIALS_PATH>

Tell the user Elasticsearch is starting up.

Step 3: Register GCS Snapshot Repository

Wait for Elasticsearch to become available by polling until the cluster health endpoint responds. Fail after 30 attempts (approximately 2.5 minutes):

MAX_RETRIES=30; COUNT=0; until curl -s -u elastic:changeme http://localhost:9200/_cluster/health | grep -q '"status"'; do COUNT=$((COUNT+1)); if [ "$COUNT" -ge "$MAX_RETRIES" ]; then echo "ERROR: Elasticsearch did not become available after $MAX_RETRIES attempts"; exit 1; fi; sleep 5; done

If the poll times out, show the error to the user and suggest checking the Elasticsearch background task output for startup errors.

Once ES is ready, register the GCS snapshot repository with these defaults:

Repository name: agent-builder-datasets
Bucket: agent-builder-datasets
Base path: knowledge_base/snapshot_dt=2026-01-10

curl -s -u elastic:changeme -X PUT "http://localhost:9200/_snapshot/agent-builder-datasets" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "gcs",
    "settings": {
      "bucket": "agent-builder-datasets",
      "base_path": "knowledge_base/snapshot_dt=2026-01-10"
    }
  }'

Verify registration succeeded by checking the response contains "acknowledged":true. If it fails, show the error to the user and ask if they want to retry or abort.

Tell the user the GCS snapshot repository has been registered.

Step 4: Restore a Snapshot

List available snapshots in the repository:

curl -s -u elastic:changeme "http://localhost:9200/_snapshot/agent-builder-datasets/_all"

Parse the response and present each snapshot as an option using AskUserQuestion. For each snapshot, show:

Label: the snapshot name
Description: number of indices and the snapshot date

Example options:

manual_test_snapshot_2 — 32 indices, Jan 12
text_retrieval_eval_bm25_elser — 2 indices, Feb 12

Once the user selects a snapshot, restore it:

curl -s -u elastic:changeme -X POST "http://localhost:9200/_snapshot/agent-builder-datasets/<snapshot_name>/_restore" \
  -H "Content-Type: application/json" \
  -d '{
    "indices": "*",
    "include_global_state": false
  }'

Verify the restore was accepted by checking the response contains "accepted":true. If it fails (e.g., index already exists), show the error and ask the user if they want to close conflicting indices and retry, or abort.

To retry with conflicting indices closed:

curl -s -u elastic:changeme -X POST "http://localhost:9200/<comma_separated_index_names>/_close"

Then re-run the restore command.

Tell the user the snapshot has been restored.

Step 5: Launch Kibana

Launch Kibana in the background using run_in_background:

yarn start --no-base-path

Tell the user Kibana is starting up.

Step 6: Confirm Phoenix Running

Use AskUserQuestion to confirm Phoenix is running:

Is Phoenix running and ready to receive traces?

Options:

Yes — Continue
Not yet — Wait for the user to start Phoenix, then ask again

Step 7: Launch EDOT

Launch the EDOT collector in the background using run_in_background:

ELASTICSEARCH_HOST=http://localhost:9200 ELASTICSEARCH_USERNAME=elastic ELASTICSEARCH_PASSWORD=changeme node scripts/edot_collector.js

Tell the user EDOT is starting up.

Step 8: Collect Eval Parameters and Output Run Command

8a: Discover available connectors

Read config/kibana.dev.yml and parse the xpack.actions.preconfigured section to get the list of available connector IDs and names. These connectors are used for both EVALUATION_CONNECTOR_ID (the judge) and --project (the model being evaluated).

If no connectors are found, tell the user to configure connectors in config/kibana.dev.yml under xpack.actions.preconfigured and abort.

8b: Select evaluation connector (judge)

Use AskUserQuestion to ask which connector to use as the evaluation judge. Present the discovered connectors as options:

Which connector should be used as the evaluation judge (EVALUATION_CONNECTOR_ID)?

Options: one per discovered connector, using id (name) as the label.

8c: Select project (model to evaluate)

Use AskUserQuestion to ask which connector/model to evaluate. Present the discovered connectors as options:

Which model should be evaluated (--project)?

Options: one per discovered connector, using id (name) as the label.

8d: Select dataset

Use AskUserQuestion to ask which dataset to use:

Which dataset should be used?

Options:

agent-builder: text-retrieval: wix-qa
agent-builder: text-retrieval: elastic-qa
agent-builder: text-retrieval: quick-tester

8e: Output the run command

Using the collected values and the following defaults, output the exact command the user should run in a separate terminal:

SELECTED_EVALUATORS: Precision@K,Recall@K,F1@K,Latency,Input Tokens,Output Tokens,Tool Calls,Factuality,Groundedness,Relevance
RAG_EVAL_K: 10,20,30,40
EVALUATION_REPETITIONS: 1

Display a summary and the command:

Stack is ready!

Elasticsearch: running (snapshot with GCS credentials)

Kibana: running (no base path)

Phoenix: confirmed running

EDOT: running

Important: Make sure Cloud Connected Mode (CCM) is enabled in Kibana before running the evaluation. Go to Stack Management > Cloud Connected Mode in the Kibana UI and enable it if it is not already active.

Run the following command in a separate terminal to start the evaluation:
TRACING_ES_URL=http://elastic:changeme@localhost:9200 \
SELECTED_EVALUATORS="<value>" \
RAG_EVAL_K=<value> \
KBN_EVALS_EXECUTOR=phoenix \
EVALUATION_CONNECTOR_ID=<value> \
DATASET_NAME="<value>" \
EVALUATION_REPETITIONS=<value> \
KBN_EVALS_SKIP_CONNECTOR_SETUP=true \
node scripts/playwright test \
  --config x-pack/platform/packages/shared/agent-builder/kbn-evals-suite-agent-builder/playwright.config.ts \
  evals/external/external_dataset.spec.ts \
  --project <value>

Substitute the actual user-selected values into the command. The user will copy-paste and run this themselves. Do NOT append any extra notes or warnings after the command block.

Action: `stop`

Kill background ES, Kibana, and EDOT processes that were launched during init.

Step 1: Kill Processes

Run the following commands to find and kill the relevant processes:

# Kill Elasticsearch
pkill -f 'elasticsearch' || true

# Kill Kibana (node process started by yarn start)
pkill -f 'scripts/kibana --dev' || true

# Kill EDOT collector
pkill -f 'edot_collector' || true

Step 2: Confirm

Tell the user:

All evaluation stack processes (ES, Kibana, EDOT) have been stopped.

Important Notes

Background processes: ES, Kibana, and EDOT are launched with run_in_background. Their task IDs are tracked by the session so stop can kill them.
Hard-coded values: TRACING_ES_URL, KBN_EVALS_EXECUTOR, and KBN_EVALS_SKIP_CONNECTOR_SETUP are not configurable — they are set for the local dev stack.
Always use node scripts/playwright test — never use npx playwright test.
Playwright config: Always uses x-pack/platform/packages/shared/agent-builder/kbn-evals-suite-agent-builder/playwright.config.ts.
Test spec: Always runs evals/external/external_dataset.spec.ts.

perform-agent-builder-eval

More from this repository

More from this repository

Perform Agent Builder Evaluation

Action: init

Step 1: Prompt for GCS Credentials

Step 2: Launch Elasticsearch

Step 3: Register GCS Snapshot Repository

Step 4: Restore a Snapshot

Step 5: Launch Kibana

Step 6: Confirm Phoenix Running

Step 7: Launch EDOT

Step 8: Collect Eval Parameters and Output Run Command

8a: Discover available connectors

8b: Select evaluation connector (judge)

8c: Select project (model to evaluate)

8d: Select dataset

8e: Output the run command

Action: stop

Step 1: Kill Processes

Step 2: Confirm

Important Notes

Perform Agent Builder Evaluation

Action: init

Step 1: Prompt for GCS Credentials

Step 2: Launch Elasticsearch

Step 3: Register GCS Snapshot Repository

Step 4: Restore a Snapshot

Step 5: Launch Kibana

Step 6: Confirm Phoenix Running

Step 7: Launch EDOT

Step 8: Collect Eval Parameters and Output Run Command

8a: Discover available connectors

8b: Select evaluation connector (judge)

8c: Select project (model to evaluate)

8d: Select dataset

8e: Output the run command

Action: stop

Step 1: Kill Processes

Step 2: Confirm

Important Notes

Action: `init`

Action: `stop`

Action: `init`

Action: `stop`