Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

$pwd:

aqua-metrics

Name: Aqua Metrics
Author: oracle

// Set up Prometheus and Grafana monitoring for AQUA vLLM model deployments on OCI. Covers the signing proxy, container registry setup, OCI Container Instance deployment, and PromQL dashboards. Triggered when user wants to monitor LLM deployments, view TTFT/latency/throughput metrics, or set up observability for AQUA.

In Manus ausführen

$ git log --oneline --stat

stars:126

forks:65

updated:28. Februar 2026 um 16:13

SKILL.md

readonly

name	aqua-metrics
description	Set up Prometheus and Grafana monitoring for AQUA vLLM model deployments on OCI. Covers the signing proxy, container registry setup, OCI Container Instance deployment, and PromQL dashboards. Triggered when user wants to monitor LLM deployments, view TTFT/latency/throughput metrics, or set up observability for AQUA.
user-invocable	true
disable-model-invocation	false

AQUA Deployment Metrics Monitoring

Monitor vLLM model deployments with Prometheus + Grafana hosted on an OCI Container Instance. The monitoring stack consists of:

Signing Proxy — handles OCI IAM auth when scraping the /metrics endpoint
Prometheus — scrapes metrics every 5s, stores time series
Grafana — visualizes dashboards from Prometheus data

Available Metrics (vLLM Prometheus)

All standard vLLM Prometheus metrics are available:

Metric	Description
`vllm:time_to_first_token_seconds`	TTFT histogram
`vllm:inter_token_latency_seconds`	ITL histogram
`vllm:e2e_request_latency_seconds`	End-to-end request latency
`vllm:num_requests_running`	Concurrent requests in flight
`vllm:num_requests_waiting`	Requests queued
`vllm:gpu_cache_usage_perc`	KV cache utilization
`vllm:num_tokens_prompt`	Prompt token count
`vllm:num_tokens_generation`	Generation token count
`vllm:request_success_total`	Successful request count

Full list: https://docs.vllm.ai/en/latest/design/metrics/

Architecture

AQUA Model Deployment
  └── /predict/metrics endpoint (requires OCI IAM signature)
           ↑
    Signing Proxy :8080
    (resource_principal auth)
           ↑
    Prometheus :9090
    (scrapes localhost:8080 every 5s)
           ↑
    Grafana :3000
    (visualizes from localhost:9090)
           ↑
    User browser (public IP of Container Instance)

Step 1: Clone the Monitoring Stack

git clone https://github.com/oracle-samples/oci-data-science-ai-samples.git
cd oci-data-science-ai-samples/ai-quick-actions/aqua_metrics

The directory contains:

signing_proxy/ — OCI-aware auth proxy (Dockerfile)
prometheus/ — Prometheus config + Dockerfile
grafana/ — Grafana Dockerfile

Step 2: Build and Push Images to OCIR

Replace <registry-domain> with your region's OCIR endpoint (e.g., iad.ocir.io) and <tenancy-namespace> with your tenancy namespace.

Signing Proxy

cd signing_proxy
docker build --no-cache -t signing_proxy .
docker tag signing_proxy <registry-domain>/<tenancy-namespace>/signing_proxy
docker push <registry-domain>/<tenancy-namespace>/signing_proxy:latest

Prometheus

The prometheus/prometheus.yml is preconfigured to scrape localhost:8080 (the proxy):

global:
  scrape_interval: 5s
  evaluation_interval: 30s
scrape_configs:
  - job_name: AQUA
    static_configs:
      - targets:
          - 'localhost:8080'

cd ../prometheus
docker build --no-cache -t prometheus .
docker tag prometheus <registry-domain>/<tenancy-namespace>/prom/prometheus
docker push <registry-domain>/<tenancy-namespace>/prom/prometheus:latest

Grafana

cd ../grafana
docker build --no-cache -t grafana .
docker tag grafana <registry-domain>/<tenancy-namespace>/grafana/grafana
docker push <registry-domain>/<tenancy-namespace>/grafana/grafana:latest

Alternative: pull grafana/grafana directly from docker.io on the Container Instance — no build needed.

Step 3: Create the OCI Container Instance

In the OCI Console: Developer Services → Containers & Artifacts → Container Instances → Create container instance

Network Configuration

Create or select a VCN with a public or private regional subnet
Security list must allow ingress on ports: 8080, 9090, 3000
Security list must allow egress to the model deployment endpoint
Check Assign a public IPv4 address for external Grafana access

Configure Three Containers

Add each container from OCIR:

signing_proxy:

Image: <registry-domain>/<tenancy-namespace>/signing_proxy:latest
Environment variable: TARGET = <model-deployment-url>/predict/metrics
- Format: https://modeldeployment.<region>.oci.customer-oci.com/<ocid>/predict/metrics

prometheus:

Image: <registry-domain>/<tenancy-namespace>/prom/prometheus:latest
No extra env vars needed (config is baked in)

grafana:

Image: <registry-domain>/<tenancy-namespace>/grafana/grafana:latest
Environment variable: PORT = 3000

Step 4: Configure Grafana

Once the Container Instance is active:

Open http://<container-instance-public-ip>:3000
Log in with admin / admin (change on first login)
Go to Configuration → Data Sources → Add data source
Select Prometheus
URL: http://localhost:9090
Click Save & Test — should show "Data source is working"

Example PromQL Queries

# TTFT p50 / p95 / p99
histogram_quantile(0.5, rate(vllm:time_to_first_token_seconds_bucket[1m]))
histogram_quantile(0.95, rate(vllm:time_to_first_token_seconds_bucket[1m]))
histogram_quantile(0.99, rate(vllm:time_to_first_token_seconds_bucket[1m]))

# Requests per second
rate(vllm:request_success_total[1m])

# KV cache utilization
vllm:gpu_cache_usage_perc

# Active requests
vllm:num_requests_running
vllm:num_requests_waiting

# Tokens per second (generation)
rate(vllm:num_tokens_generation[1m])

For Grafana dashboard templates, see: https://grafana.com/docs/grafana/latest/getting-started/build-first-dashboard/

Exposing the Metrics Endpoint

The AQUA model deployment exposes Prometheus metrics at:

<deployment-url>/predict/metrics

The signing proxy handles OCI IAM signatures via resource_principal so Prometheus can scrape without managing OCI credentials directly.

Key Source Files

oracle-samples/oci-data-science-ai-samples — ai-quick-actions/aqua_metrics/
ads/aqua/modeldeployment/deployment.py — deployment endpoint management

related-skills.json

gleiches Repository

aqua-cli.md

from "oracle/accelerated-data-science"

Complete CLI reference for the ADS AQUA command-line interface (ads aqua). Covers all model, deployment, evaluation, and fine-tuning commands with full parameter documentation. Triggered when user asks about CLI commands, wants to run AQUA operations from terminal, or needs command syntax.

2026-02-28126

aqua-deployment.md

from "oracle/accelerated-data-science"

Deploy LLM models on OCI using AI Quick Actions (AQUA) - single model, multi-model, stacked (LoRA), with GPU shape selection, vLLM configuration, streaming, and tool calling. Triggered when user wants to deploy, update, or manage model deployments.

2026-02-28126

aqua-evaluation.md

from "oracle/accelerated-data-science"

Evaluate LLM model quality using BERTScore, ROUGE, Perplexity, and Text Readability metrics on OCI AI Quick Actions (AQUA). Covers dataset preparation, evaluation job creation, and report interpretation. Triggered when user wants to evaluate or benchmark a model.

2026-02-28126

aqua-finetuning.md

from "oracle/accelerated-data-science"

Fine-tune LLM models using LoRA on OCI AI Quick Actions (AQUA). Covers dataset preparation (instruction, conversational, multimodal, tokenized formats), hyperparameter tuning, distributed training, and training metrics. Triggered when user wants to fine-tune or customize a model.

2026-02-28126

aqua-model-lifecycle.md

from "oracle/accelerated-data-science"

Register, list, get, and manage LLM models in OCI AI Quick Actions (AQUA) using the ADS SDK. Triggered when user wants to import models from HuggingFace or Object Storage, browse available models, or manage model catalog entries.

2026-02-28126

aqua-troubleshooting.md

from "oracle/accelerated-data-science"

Diagnose and fix OCI AI Quick Actions (AQUA) issues including deployment failures, OOM errors, authorization problems, capacity issues, container errors, and policy misconfigurations. Triggered when user encounters errors or needs help debugging AQUA workflows.

2026-02-28126

package.json

"author": "oracle"

"repository": "oracle/accelerated-data-science"

GitHub-Repository öffnen Creator-Repositorys ansehen

$ install --global

$ download --local

In Manus ausführen

$ useful --forSOC

DatenwissenschaftlerInformatik- und Mathematikberufe15-2051L4

Netzwerk- und ComputersystemadministratorenL4

name	aqua-metrics
description	Set up Prometheus and Grafana monitoring for AQUA vLLM model deployments on OCI. Covers the signing proxy, container registry setup, OCI Container Instance deployment, and PromQL dashboards. Triggered when user wants to monitor LLM deployments, view TTFT/latency/throughput metrics, or set up observability for AQUA.
user-invocable	true
disable-model-invocation	false

AQUA Deployment Metrics Monitoring

Monitor vLLM model deployments with Prometheus + Grafana hosted on an OCI Container Instance. The monitoring stack consists of:

Signing Proxy — handles OCI IAM auth when scraping the /metrics endpoint
Prometheus — scrapes metrics every 5s, stores time series
Grafana — visualizes dashboards from Prometheus data

Available Metrics (vLLM Prometheus)

All standard vLLM Prometheus metrics are available:

Metric	Description
`vllm:time_to_first_token_seconds`	TTFT histogram
`vllm:inter_token_latency_seconds`	ITL histogram
`vllm:e2e_request_latency_seconds`	End-to-end request latency
`vllm:num_requests_running`	Concurrent requests in flight
`vllm:num_requests_waiting`	Requests queued
`vllm:gpu_cache_usage_perc`	KV cache utilization
`vllm:num_tokens_prompt`	Prompt token count
`vllm:num_tokens_generation`	Generation token count
`vllm:request_success_total`	Successful request count

Full list: https://docs.vllm.ai/en/latest/design/metrics/

Architecture

AQUA Model Deployment
  └── /predict/metrics endpoint (requires OCI IAM signature)
           ↑
    Signing Proxy :8080
    (resource_principal auth)
           ↑
    Prometheus :9090
    (scrapes localhost:8080 every 5s)
           ↑
    Grafana :3000
    (visualizes from localhost:9090)
           ↑
    User browser (public IP of Container Instance)

Step 1: Clone the Monitoring Stack

git clone https://github.com/oracle-samples/oci-data-science-ai-samples.git
cd oci-data-science-ai-samples/ai-quick-actions/aqua_metrics

The directory contains:

signing_proxy/ — OCI-aware auth proxy (Dockerfile)
prometheus/ — Prometheus config + Dockerfile
grafana/ — Grafana Dockerfile

Step 2: Build and Push Images to OCIR

Replace <registry-domain> with your region's OCIR endpoint (e.g., iad.ocir.io) and <tenancy-namespace> with your tenancy namespace.

Signing Proxy

cd signing_proxy
docker build --no-cache -t signing_proxy .
docker tag signing_proxy <registry-domain>/<tenancy-namespace>/signing_proxy
docker push <registry-domain>/<tenancy-namespace>/signing_proxy:latest

Prometheus

The prometheus/prometheus.yml is preconfigured to scrape localhost:8080 (the proxy):

global:
  scrape_interval: 5s
  evaluation_interval: 30s
scrape_configs:
  - job_name: AQUA
    static_configs:
      - targets:
          - 'localhost:8080'

cd ../prometheus
docker build --no-cache -t prometheus .
docker tag prometheus <registry-domain>/<tenancy-namespace>/prom/prometheus
docker push <registry-domain>/<tenancy-namespace>/prom/prometheus:latest

Grafana

cd ../grafana
docker build --no-cache -t grafana .
docker tag grafana <registry-domain>/<tenancy-namespace>/grafana/grafana
docker push <registry-domain>/<tenancy-namespace>/grafana/grafana:latest

Alternative: pull grafana/grafana directly from docker.io on the Container Instance — no build needed.

Step 3: Create the OCI Container Instance

In the OCI Console: Developer Services → Containers & Artifacts → Container Instances → Create container instance

Network Configuration

Create or select a VCN with a public or private regional subnet
Security list must allow ingress on ports: 8080, 9090, 3000
Security list must allow egress to the model deployment endpoint
Check Assign a public IPv4 address for external Grafana access

Configure Three Containers

Add each container from OCIR:

signing_proxy:

Image: <registry-domain>/<tenancy-namespace>/signing_proxy:latest
Environment variable: TARGET = <model-deployment-url>/predict/metrics
- Format: https://modeldeployment.<region>.oci.customer-oci.com/<ocid>/predict/metrics

prometheus:

Image: <registry-domain>/<tenancy-namespace>/prom/prometheus:latest
No extra env vars needed (config is baked in)

grafana:

Image: <registry-domain>/<tenancy-namespace>/grafana/grafana:latest
Environment variable: PORT = 3000

Step 4: Configure Grafana

Once the Container Instance is active:

Open http://<container-instance-public-ip>:3000
Log in with admin / admin (change on first login)
Go to Configuration → Data Sources → Add data source
Select Prometheus
URL: http://localhost:9090
Click Save & Test — should show "Data source is working"

Example PromQL Queries

# TTFT p50 / p95 / p99
histogram_quantile(0.5, rate(vllm:time_to_first_token_seconds_bucket[1m]))
histogram_quantile(0.95, rate(vllm:time_to_first_token_seconds_bucket[1m]))
histogram_quantile(0.99, rate(vllm:time_to_first_token_seconds_bucket[1m]))

# Requests per second
rate(vllm:request_success_total[1m])

# KV cache utilization
vllm:gpu_cache_usage_perc

# Active requests
vllm:num_requests_running
vllm:num_requests_waiting

# Tokens per second (generation)
rate(vllm:num_tokens_generation[1m])

For Grafana dashboard templates, see: https://grafana.com/docs/grafana/latest/getting-started/build-first-dashboard/

Exposing the Metrics Endpoint

The AQUA model deployment exposes Prometheus metrics at:

<deployment-url>/predict/metrics

The signing proxy handles OCI IAM signatures via resource_principal so Prometheus can scrape without managing OCI credentials directly.

Key Source Files

oracle-samples/oci-data-science-ai-samples — ai-quick-actions/aqua_metrics/
ads/aqua/modeldeployment/deployment.py — deployment endpoint management

aqua-metrics

AQUA Deployment Metrics Monitoring

Available Metrics (vLLM Prometheus)

Architecture

Step 1: Clone the Monitoring Stack

Step 2: Build and Push Images to OCIR

Signing Proxy

Prometheus

Grafana

Step 3: Create the OCI Container Instance

Network Configuration

Configure Three Containers

Step 4: Configure Grafana

Example PromQL Queries

Exposing the Metrics Endpoint

Key Source Files

Mehr aus diesem Repository

AQUA Deployment Metrics Monitoring

Available Metrics (vLLM Prometheus)

Architecture

Step 1: Clone the Monitoring Stack

Step 2: Build and Push Images to OCIR

Signing Proxy

Prometheus

Grafana

Step 3: Create the OCI Container Instance

Network Configuration

Configure Three Containers

Step 4: Configure Grafana

Example PromQL Queries

Exposing the Metrics Endpoint

Key Source Files

Mehr aus diesem Repository